Last night I found an interesting email in my gmail account. It seems that DreamHost had moved me from my usual server and put me on an “evaluation” server. Apparently I had been using too many CPU minutes on the shared server. And while this is certainly a very reasonable resource for DreamHost to be monitoring, I am kind of bummed that I didn’t get a warning message of some kind before I was moved. (Although, it’s certainly possible that I did get a warning message and missed it somehow…)
Step One: Make Some Guesses
So I’ve been moved to a “limbo” server and directed to a FAQ laying out exactly what’s going on. I’m certainly not looking to be any sort of troublemaker on the server, and to the best of my knowledge I don’t have anything running on my website/shell that would generate very much CPU time, but I immediately have some contenders.
My first guess (naturally) is the most recent addition to my website. I’ve installed two Trac instances for handling the Bunker Management System project and the Late-Night at Nichols: LAN Party project. They run in FastCGI mode using SQLite, so they were my first suspects.
My second suspect was the meeting monitoring software that I wrote for Mr. Plumb. Though that has been running “as-is” for a couple months now and I wouldn’t expect any change in CPU usage.
Step Two: Check the CPU Usage Logs
Unfortunately I wasn’t able to handle the situation immediately, because I needed to wait for a while for the resource consumption logs to be created for my user so that I could analyze what’s going on. But this morning when I hopped on to check things out, I was surprised by the results. The file at ~/logs/resources/bradshaw.sa.analyzed.0 had what I was looking for:
Process CPU seconds user machine count average
php.cgi 7415.0300 99.696% 30.896% 8961 0.827
scrape.py 11.3800 0.153% 0.047% 53 0.215
calendar_copy.p 3.5800 0.048% 0.015% 79 0.045
trac.fcgi 2.8200 0.038% 0.012% 13 0.217
trac.fcgi 2.6100 0.035% 0.011% 33 0.079
notify.py 1.9100 0.026% 0.008% 53 0.036
wget 0.1200 0.002% 0.001% 13 0.009
bkms.fcgi 0.0900 0.001% 0.000% 13 0.007
bash 0.0400 0.001% 0.000% 1 0.040
scrape.py 0.0200 0.000% 0.000% 159 0.000
sshd 0.0200 0.000% 0.000% 1 0.020
ls 0.0200 0.000% 0.000% 7 0.003
notify.py 0.0100 0.000% 0.000% 159 0.000
----------------------------------------------------------------------
Total: 7437.6500 100.000% 30.990% 9545
Average per day: 7437.6500 1 days
CPU percentage assumes 24000 cpu seconds per day total.
I emphasized the line that is clearly causing the issue with too much CPU usage. This isn’t exactly a solved case, however, only a great tip. First, this knocked out my original guesses of Mr. Plumb’s notifier, and the Trac installations. The lines for scrape.py and notify.py are the meeting notifier software, and summed they only come to 13.32 seconds. The trac.fcgi lines handle the Trac installations, and summed they come to 5.43 seconds. Much less than I expected!
But that problem line, the obvious over-user of precious cycles, is php.cgi. That doesn’t tell me a specific application, instead, that tells me that some php pages written to execute in CGI mode are using the large numbers of CPU cycles. Odd, I don’t have many of those.
Step 3: If we don’t know what the problem is by now, make new guesses.
Well, since my assumptions failed me, it was time to try and think of any php cgi pages that I have running that do anything significant. This was a little harder for me, because almost all (all?) of my personal projects run in Python, not PHP. So my list of possible culprits ended up being pretty short: WordPress, Gallery2, and PHP-Calendar.
But if I’ve already checked the CPU usage logs, how will I make any progress. I figured I had two options. First, the CPU Minutes FAQ on the DreamHost Wiki lays out the methodology to make all of ones PHP not run as CGI and move their usage into Apache instead. (Hopefully by process of elimination it would reveal the answer.)
Step 4: Check the web server logs
That’s a little much for me at this point, so I opted for option two: check the web server logs to see what part of the site is getting hit most. Here is where I found my golden goose.
| 4378 |
26.43% |
Mar/ 4/06 12:45 AM |
/ical/day.php |
| 3316 |
22.93% |
Mar/ 4/06 12:46 AM |
/ical/week.php |
| 3164 |
28.51% |
Mar/ 4/06 12:46 AM |
ical/month.php |
| 950 |
2.34% |
Mar/ 4/06 12:44 AM |
/ical/print.php |
Those are the top four lines of my Request Report. Bah! For some reason my PHP-Calendar installation is getting jackhammered like downtown Boston. Now to figure out why.
Checking the Host report shows this important line:
| 12259 |
74.07% |
66.249.65.11 |
Jinkies! 74% of my traffic is requested by one IP address! But who the heck is it? My guesses were either myself, myself at work, or a search engine. Survey says?
[bradshaw@limbo-spunky3 resources] $ host 66.249.65.11
Name: crawl-66-249-65-11.googlebot.com
Address: 66.249.65.11
It’s Google. I’ve heard of this before. It turns out that virtually all calendaring applications have an infinite set of links that are easy for GoogleBots to follow. Turns out that my PHP-Calendar is a labyrinth for Google’s web crawler, and is probably a hog of CPU resources at the time.
Conclusion(ish) and Solution (probably)
Notice earlier I said “probably”. This is a classic case of correlation, and it’s very important to remember (in all facets of one’s life) correlation does not equal causation. But rationally I’m pretty sure that this is the problem.
So the solution? Well the correct solution is to actually have an appropriate robots.txt file to let Google know that my /ical folder really isn’t that interesting. However, I’m sort of lazy at the moment and haven’t been using that application effectively for a while now. So I just did a little chmod 000 ical and called it good.
Hopefully here shortly my CPU usage will come well under the 50 to 60 minutes that DreamHost allocates for their shared servers. If not, I might have to get a little more involved tracking down PHP applications that might be causing issues. But for now, I think it’s handled.