10/09/2005 @15:18:58 ^16:14:56

Home of pirates, drunks, and whores!

You know I never got what that stupid episode was about until recently! Although I have a very clear memory of sitting in my flat bitching about how totally wrong the "a stranger is a friend you haven't met" line is in my first first year as far back as 1998, so I guess nothing ever changes

Test-o-rama!

Pretty much on a whim yesterday I decided to convert this site's backend database from bdb to sqlite. No don't ask me why, I think the reason was to make it easier to get the proper last modified time, but I still haven't done that yet. Anyway, if it breaks and you get a 500 Internal Server Error then that's why.

Of course I did all this editing on the live server instead of a test server like I should but I don't have a test server! That's why snafu's been detecting fake updates and so forth. Fortunately the changes to the weblog module worked first time so I don't think anybody would have seen anything odd but on the other hand who cares?

Hurrr let's put everything we can into a database

For example, the entire directory of apache access logs. Using this humorously unintelligible string of characters

/^([\d\.]+?) (\S+?) \[(.+?)\] (\S+) \"(.+?)\" (\d+?) ([-\d]+?) \"(.*?)\" \"(.+?)\"/

and a bit of post processing. Don't try to use that on normal apache logs though because I changed my logging output format just enough to break all these fancy log file analysers!

I wanted to run some statistics generation after somebody on Planet Debian was complaining about MSNBot being a bandwidth whore and I've noticed how prevalent it is in my logs too and I wanted to find out just how much. Thus, the top ten bandwidth whores in my server logs are as follows:

Bandwidth (MB)User agent string
130.499855995178msnbot/1.0 (+http://search.msn.com/msnbot.htm)
49.4149570465088Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
27.0880346298218Googlebot/2.1 (+http://www.google.com/bot.html)
26.2569932937622Mozilla/4.0 compatible ZyBorg/1.0 (wn-14.zyborg@looksmart.net; http://www.WISEnutbot.com)
18.8409118652344Mozilla/4.0 compatible ZyBorg/1.0 (wn-13.zyborg@looksmart.net; http://www.WISEnutbot.com)
14.7500219345093Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
11.6859331130981Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html)
9.41157245635986aipbot/1.0 (aipbot; http://www.aipbot.com; aipbot@aipbot.com)
8.2594518661499Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
7.677978515625DTAAgent

So yeah, it's a whore all right. Also I could complain about how stupid user agent strings are but not today. Hell's bells, I love statistics so much! However the SQL line took ages to work out because I am a SQL noob so here it is for posterity:

SELECT sum(size)/1048576,useragent
	FROM access
	WHERE method like "GET" 
	GROUP BY useragent
	ORDER BY sum(size) desc
	LIMIT 10;

I suppose I could have done the query in perl, but I supposed that it was faster to do the slow grunt work of parsing the logs and caching them in a database one time only, then running sql queries afterwards because that's what they're for! You could even put in an external logger into apache to write logs into a database to keep the thing updated in real time but the hell with that!

This update had far too many exclamation marks in it and as a result sounds really over-excited!