10/09/2005 @15:18:58 ^16:14:56
Home of pirates, drunks, and whores!
You know I never got what that stupid episode was about until recently! Although I have a very clear memory of sitting in my flat bitching about how totally wrong the "a stranger is a friend you haven't met" line is in my first first year as far back as 1998, so I guess nothing ever changes
Test-o-rama!
Pretty much on a whim yesterday I decided to convert this site's backend database from bdb to sqlite. No don't ask me why, I think the reason was to make it easier to get the proper last modified time, but I still haven't done that yet. Anyway, if it breaks and you get a 500 Internal Server Error then that's why.
Of course I did all this editing on the live server instead of a test server like I should but I don't have a test server! That's why snafu's been detecting fake updates and so forth. Fortunately the changes to the weblog module worked first time so I don't think anybody would have seen anything odd but on the other hand who cares?
Hurrr let's put everything we can into a database
For example, the entire directory of apache access logs. Using this humorously unintelligible string of characters
/^([\d\.]+?) (\S+?) \[(.+?)\] (\S+) \"(.+?)\" (\d+?) ([-\d]+?) \"(.*?)\" \"(.+?)\"/
and a bit of post processing. Don't try to use that on normal apache logs though because I changed my logging output format just enough to break all these fancy log file analysers!
I wanted to run some statistics generation after somebody on Planet Debian was complaining about MSNBot being a bandwidth whore and I've noticed how prevalent it is in my logs too and I wanted to find out just how much. Thus, the top ten bandwidth whores in my server logs are as follows:
Bandwidth (MB) | User agent string |
---|---|
130.499855995178 | msnbot/1.0 (+http://search.msn.com/msnbot.htm) |
49.4149570465088 | Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) |
27.0880346298218 | Googlebot/2.1 (+http://www.google.com/bot.html) |
26.2569932937622 | Mozilla/4.0 compatible ZyBorg/1.0 (wn-14.zyborg@looksmart.net; http://www.WISEnutbot.com) |
18.8409118652344 | Mozilla/4.0 compatible ZyBorg/1.0 (wn-13.zyborg@looksmart.net; http://www.WISEnutbot.com) |
14.7500219345093 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) |
11.6859331130981 | Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html) |
9.41157245635986 | aipbot/1.0 (aipbot; http://www.aipbot.com; aipbot@aipbot.com) |
8.2594518661499 | Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) |
7.677978515625 | DTAAgent |
So yeah, it's a whore all right. Also I could complain about how stupid user agent strings are but not today. Hell's bells, I love statistics so much! However the SQL line took ages to work out because I am a SQL noob so here it is for posterity:
SELECT sum(size)/1048576,useragent FROM access WHERE method like "GET" GROUP BY useragent ORDER BY sum(size) desc LIMIT 10;
I suppose I could have done the query in perl, but I supposed that it was faster to do the slow grunt work of parsing the logs and caching them in a database one time only, then running sql queries afterwards because that's what they're for! You could even put in an external logger into apache to write logs into a database to keep the thing updated in real time but the hell with that!
This update had far too many exclamation marks in it and as a result sounds really over-excited!