Friday, September 7, 2007

phx.gbl., et al.

Gazing at my access_logs this week, I learned a couple of interesting things.

Evidently, Microsoft is dabbling in the world of referer_spam and bogus hostnames to clog access_logs with confusing junk. This is apparently being done as some kind of quality test.

I have noticed this for months and even though I think I understand what might be going on behind the scenes at Microsoft I am still puzzled by this rather blunt implementation.

Long story short, if you see hostnames liks this in your access_log, it is evidently Microsoft running some kind of s00per-s3kr3t QA script comparing its search results with the pages in its index.


(basically, anything appearing to come from the *absolutely meaningless* .gbl top-level domain)

I am happy to know that Microsoft is working to improve its search results, but the vague and clumsily stealthy way of doing it is, well, puzzling.

Another thing I learned this week is that there are a lot more spam drones masquerading as Googlebot than I realized. I usually filter out Googlebot traffic from my access_logs, as I know my sites are well indexed and I do not feel a need to monitor Googlebot traffic to the sites on a daily basis.

I turn that filter off once in a while, though, and I was a little surprised yesterday to see hits coming from a rogue drone at (where else?) claiming to be Googlebot: - - [06/Sep/2007:14:55:37 -0400] "GET / HTTP/1.1" 206 14484 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +" - - [06/Sep/2007:14:55:39 -0400] "GET / HTTP/1.1" 206 14484 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +" - - [06/Sep/2007:14:55:41 -0400] "GET / HTTP/1.1" 206 14484 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +"

The hits all return status 206 (incomplete download), which is not a common error for small HTML pages. (owned by is a German hosting company that I know only for the referer_spam and abusive drone traffic it sends my way. It is mostly banned from my sites but I had not noticed its reappearance as a phony Googlebot until recently.

No comments: