Chart Generation from Python
Following yesterday’s progress on the print app, I started today with some stat generation for analysis. The data is parsed very simply by just iterating over the entire log and dynamically storing values (and incrementing them as necessary) in a couple python dictionaries.
The collection of stats is definitely O(n), but I have no doubt that the memory consumption of this thing is hella wasteful. (For one thing, all statistics are generated from scratch on each execution of the script, and no state at all carries between executions.) Ah well, get things working first, right? Then optimize?
However, when the “neat” part that I was looking forward to, creating the graphs for quick-reads and really easy analysis, things turned much less “neat”. It turns out that the libraries available for chart generation leave quite a bit to be desired. I looked at three libraries, PyX, PyChart, and PyGDChart2. Each has it’s own issues.
My first swing at chart generation was PyX. Everything was going pretty smooth until it came to actually providing the data for the library to generate it’s charts from. It became immediately apparent that this library was created for a very specific purpose, and that purpose was reading in text files of data collected from some sort of scientific or academic environment. Just passing in a native Python structure and making some pretty graphics out of it was not on the “trivial to accomplish” list. I ditched this one before I spat out my first pixel.
So my second swing at making things happen in a simple and relatively sane manner was PyChart. Things were much nicer this time around, even getting to the point that I was spitting out a PNG image for perusing. Then PyChart fell apart on me in a couple ways. First, it was clear that creating a series of images was not in the dice. Instead, each execution of a script that uses PyChart is pretty much expected to generate a single chart. Well, at least generate a single chart or learn a lot more about the API than what should be necessary (in my opinion) to create a elementary bar chart.
The final failure of the night was looking at PyGDChart2. This definitely looked like the package with the most potential, with an API that made the most sense for doing something as trivial as chart generation. (I say “trivial” for chart generation not because the entire process is trivial, but the expected interface is so trivial. Very basic amount of inputs necessary.) Alas, PyGDChart2 (or it’s predecessor PyGDChart) were not available as ebuilds anywhere I could find, nor was the C library, gdchart, that the Python library wrapped.
All in all, I’m pretty certain that PyGDChart2 is the library that I want to be using. This just means that I’m going to have to figure out some way to politely package up the necessary C library and the Python library without tainting the nicely managed environment that I’m developing in.
In short, I’ve got the stats that I want, but the chart generation is surprisingly “dumb” so far. If anything, I think this provides a good “breaking” point for the script. Rather than one script that will read in all of the data, generate the stats, and then generate some images, I think that a single script should handle all of the stat generation and nothing else, so I can optimize it a bit and store intermediate values in the database. This should free up the architecture of the graph generation scripting in case I have to cave to some inane library requirement like “different script for every graph”.
November 2nd, 2005 at 10:49 pm
I had the same problem with trying to make graphs. I was amazed that there were no good chart packages. I ended up finding the source to something called PILGraph, and updated it to work with the newer versions of PILGraph. It works quite well, has a good algorithm for picking time/dates to be displayed on the axes, and depends only on the PIL Library. I’ve been meaning to package it up and put it on my site, but haven’t gotten around to it. If you’d like, I can send it to you.
-Winston