In upgrading our databases of information about legislation before the House and Senate in the 111th Congress, we have been writing some scripts that visit government websites, obtain information from them, convert that information into variable form, store those variables in a database, transform those variables, then use them to write some new web pages with things like progressive rankings, individual politician-trackers, and the like. It’s a lot of fun when the process works, but when we hit snags the headaches can be pretty darned big.
One particular headache has been the server-side limit imposed by our web host over at That’s My Congress. Quite understandably, our hosting provider has set up our server so that if we try to run a script that keeps going for longer than a minute or two, it stops the script. It’s a smart move to prevent us from hogging server load and to keep infinitely recursive programs from running infinitely. But some of the things we want to do take more than a minute — for instance, loading up the cosponsorship pages for the 1,104 H.R. bills before the House and scanning each to find out who is supporting what bill. That task takes us about 50 minutes, and so we have to split up that task into about fifty sections.
I can tell you from recent experience that manually typing in fifty script commands, each spaced enough in time to avoid server overload, is really, really boring. I’m not trained in computer science, and in my search for a way to automate this task I’ve tried a few things. My first try was the sleep command, figuring that if I let the server pause a few seconds between each command, I’d satisfy the server gods. Strike one: it turns out that server limits can’t be escaped that way. My next try was to set up cron jobs, unix commands that ask a server to run certain scripts on regularly scheduled basis. Strike two: my hosting provider limits the number of possible cron jobs to less than ten.
I thought I was out of options until I realized that I could do some automation off the server and in my own browser. On my third try, I hit a home run with iMacros, a free plugin for Firefox written by iOpus. It allows you to “record” a sequence actions in a Firefox browser (like opening a tab or accessing a URL containing a script), then “play” them back later. The sequences are editable and customizable with a protocol that makes intuitive sense. For example:
VERSION BUILD=6111228 RECORDER=FX
What’s really nice is that I can place waiting periods in between each command so that I don’t annoy the server gods and abuse cpu privileges. If you’ve got similar issues in your online work, I recommend iMacros without reservation.