Tuesday, December 2, 2008

In which I wax poetic about matplotlib

Every scientist needs to make graphs. It's one of those necessary evils in order to effect information transfer. It's also kind of unsexy. What's sexy is the research, what's unsexy is the meticulous work that can be required to put it into graphical form.

It's like in college, when I was talking with a girl who was working on her senior thesis. She said, "I'm almost done, I just have to do the bibliography" and she had 3 hours before the deadline where she had to physically turn it in. I looked at her and said, "Oh you poor girl, get on that right now." Bibliographies are also necessary evils and always take longer than you think/hope.

So, fortunately, in the computing world there are lots of people working on lots of free projects and software to make people's lives easier. Out of the goodness of their hearts. It's lovely and wonderful. I myself work on such a project. I won't tell you what it is though - I'd lose my anonymity in a heartbeat that way. But anyway.

One of my favorite free projects is called matplotlib. It's a most excellent piece of graphing software, extremely useful for me, as someone who does a lot of work in Python, as well. Anyway, you know MATLAB? You'll probably be just fine with matplotlib too.

Check out this gallery of screenshots for all the things you can do with matplotlib. It's incredible. It's awesome. It's magnificent. I LOVE MATPLOTLIB. Almost as much as I love mercurial. Hubby HATES Excel and recently asked me how I made such pretty graphs - I told him, matplotlib!!! So I made a few graphs for him. Lickety-split, he has pretty graphs.

What matplotlib is ideal for:

  • If you already use Python to do your scripting, matplotlib is a MATLAB-ian interface within Python, so your transition to matplotlib will be very easy. Just download the egg file to get all your prereq packages and you're set to go.
  • People who need documentation. Matplotlib has some of the best documentation for a free software project I have ever seen. Active mailing list, lots of examples, documentation for every class/function/everything, and even more examples.
  • People who have to turn massive textual data into graphs. This is the best part about matplotlib, in my opinion. Python is my favorite workhorse language. It's very easy to pull out bits of data from massive amounts of textual data. The language is EASY, READABLE, and AWESOME. Since matplotlib is within Python, you just pull out your bits of textual data, and plop them into a graph of your choosing - bar graph, histogram, line plot, whatever. Woot!
  • People who hate Excel.
  • People who aren't afraid of trying new things and doing a little digging on how to get something to work. If you're totally unfamiliar with everything I'm talking about, you can still love matplotlib, you'd just have a learning curve issue. But once you get it, you'll love it. But only if you're willing to put in some time to get away from Excel.
  • People who are cheap. Matplotlib is free. FREE.
  • People who like to control things. You know how in Excel, you can't force it to do certain things? You just can't - if it's not built-in, you can't get it to do it. Not so with matplotlib - if you can code it, you can make it. And you can always code it.
Ok, I guess I'm done waxing poetic about this piece of software. But I think it's worth sharing. Because as I said, every scientist needs to make graphs sometimes.

9 comments:

Julie @ Bunsen Burner Bakery said...

Um, I want graphs that look like those examples. I have no idea what they mean, but Julie likey.

My PI is head over heels apeshit in love with Prism, and I have no idea why. I hate Prism... I swear it is even worse than Excel (personally, I miss Lotus 123). He is convinced that the whole world in science uses Prism. My favorite moment was when, the day before an abstract was due, a junior faculty member with whom I collaborate had decided at the last minute to submit an abstract and needed a bunch of my data. So, I spent hours compiling everything she could possibly need in Prism and sent it over. Of course, her lab doesn't use Prism, and I had to stay up all night and spend 14 hours trying to copy and paste and swear at Excel trying to get it all in there. @*#%!!!!

PhizzleDizzle said...

Wow - I just went to check out Prism (never heard of it) and it is NOT for me. Anything that requires that much metadata is not for me. Sheets, Data Tables, Information, etc? I don't think so.

I guess one reason why I like matplotlib is you can do ANYTHING as long as you can code it (I am a CS person after all). I generally don't like software that has a lot of "built-in" options that force you to do this or that. I want to be able to invent my own totally weird graph if I want to. So like, the fact that Prism has to have it's own freakin file extension and data format is super uncool to me.

Does your advisor force you to use Prism or are you allowed to branch out if you want? I think I'd go nuts if I had to use that thing.

Eugenie said...

*drools*

Ugh. Too bad I don't know python (although I abuse the crap out of MATLAB).

Have you ever taken a peek at R? It's also free and has some awesome graphing capabilities.

(Excel is crap, we shouldn't have to pay to use such crappy software...)

PhizzleDizzle said...

I've heard of R, but I've never used it...I thought it was a machine-learning statistical analysis tool or something?

Python is easy, and what you'd need to know for this software is very minimal. The hardest part of all the stuff I describe is probably installing it. Let's say you had a file named data.txt with one data point per line on it. You could make a line plot this way:

f = file("data.txt", "rb)
data = [ float(i) for in in f.readlines() ]

pylab.plot(data)

Boom. Done.

Or a histogram this way:

pylab.hist(data)

Boom. Done. So awesome.

Eugenie said...

R is a statistical software but it's command-line based. I have a feeling that R may work similar to matplotlib in plotting aspects (since histograms are a form of statistical analysis... for example).

But over winter break I'm going to mess around with matplotlib- it would be great to move away from excel.

Julie @ Bunsen Burner Bakery said...

Yep, we are absolutely required to use Prism. Even the rotating students who come through our lab for 3 months are required to get a copy of Prism for their own computers and graph all data in Prism, which is so stupid because it takes way more than 3 months to learn how to use Prism (I've been using it for more than 3 years and I STILL can't figure out how to get it to do half of what I want but it claims to be able to do).

Plus, I feel like it is a giant handicap, because I don't get to branch out and learn to use things like matplotlib. It would probably take me a bit to learn how to code my own graphs, but in the end, the payoff would be huge. But he would never give me the time to learn it, because it is Prism or die around here.

Juniper Shoemaker said...

Mantra #2 calls for my favorite get-through-the-day theme song of the moment, which I have listened to eleventy dozen times.

Cheesy as it is, it does get me to tear up and press on. And it's also so damned cheerful you can't help liking it.

Juniper Shoemaker said...

Oh, for fuck's sake. I posted my comment on the wrong entry.

Fine, Blogger Comment Pop-Up Window. Obviously, I liked this post too.

Albatross said...

Oooooo, those are pretty graphs! I have never used Matlab and don't do any coding but for graphs that beautiful, I might have to learn that!