Home of pirates, drunks, and whores!
You know I never got what that stupid episode was about until recently! Although I have a very clear memory of sitting in my flat bitching about how totally wrong the "a stranger is a friend you haven't met" line is in my first first year as far back as 1998, so I guess nothing ever changes
Pretty much on a whim yesterday I decided to convert this site's backend database from bdb to sqlite. No don't ask me why, I think the reason was to make it easier to get the proper last modified time, but I still haven't done that yet. Anyway, if it breaks and you get a 500 Internal Server Error then that's why.
Of course I did all this editing on the live server instead of a test server like I should but I don't have a test server! That's why snafu's been detecting fake updates and so forth. Fortunately the changes to the weblog module worked first time so I don't think anybody would have seen anything odd but on the other hand who cares?
Hurrr let's put everything we can into a database
For example, the entire directory of apache access logs. Using this humorously unintelligible string of characters
/^([\d\.]+?) (\S+?) \[(.+?)\] (\S+) \"(.+?)\" (\d+?) ([-\d]+?) \"(.*?)\" \"(.+?)\"/
and a bit of post processing. Don't try to use that on normal apache logs though because I changed my logging output format just enough to break all these fancy log file analysers!
I wanted to run some statistics generation after somebody on Planet Debian was complaining about MSNBot being a bandwidth whore and I've noticed how prevalent it is in my logs too and I wanted to find out just how much. Thus, the top ten bandwidth whores in my server logs are as follows:
|Bandwidth (MB)||User agent string|
|49.4149570465088||Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)|
|26.2569932937622||Mozilla/4.0 compatible ZyBorg/1.0 (email@example.com; http://www.WISEnutbot.com)|
|18.8409118652344||Mozilla/4.0 compatible ZyBorg/1.0 (firstname.lastname@example.org; http://www.WISEnutbot.com)|
|14.7500219345093||Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)|
|11.6859331130981||Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html)|
|9.41157245635986||aipbot/1.0 (aipbot; http://www.aipbot.com; email@example.com)|
|8.2594518661499||Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)|
So yeah, it's a whore all right. Also I could complain about how stupid user agent strings are but not today. Hell's bells, I love statistics so much! However the SQL line took ages to work out because I am a SQL noob so here it is for posterity:
SELECT sum(size)/1048576,useragent FROM access WHERE method like "GET" GROUP BY useragent ORDER BY sum(size) desc LIMIT 10;
I suppose I could have done the query in perl, but I supposed that it was faster to do the slow grunt work of parsing the logs and caching them in a database one time only, then running sql queries afterwards because that's what they're for! You could even put in an external logger into apache to write logs into a database to keep the thing updated in real time but the hell with that!
This update had far too many exclamation marks in it and as a result sounds really over-excited!