On Bots
June 2, 2006 11:48 PM   Subscribe

On Bots - results of a year long experiment on search engine bot behaviour
posted by MetaMonkey (15 comments total) 3 users marked this as a favorite

 
"PLEASE GOD, WHAT DOES IT ALL MEAN?!?!?!"
posted by IronLizard at 12:06 AM on June 3, 2006


boo to cliffhangers.
posted by Tuwa at 12:14 AM on June 3, 2006


That page almost seems to suggest the efficacy of the bots would be ranked MSN, Google, and Yahoo- the Yahoo bot seems especially stupid to not realize the pages it is crawling are valueless, whereas the MSN and Google bots figure this out much sooner, and basically stop crawling. There is a lot of work done on bot technology precisely to avoid getting trapped in autolink farms and spam pages, and to recognize content for being valuable. It's not perfect, but those trees seem to suggest MSN and Google are much further on in "smart" bots than Yahoo.
posted by hincandenza at 12:54 AM on June 3, 2006


"PLEASE GOD, WHAT DOES IT ALL MEAN?!?!?!"

It means you are stupid son, but don't worry ! Even teh stupid can become prez !
posted by elpapacito at 3:10 AM on June 3, 2006


note: Help maintain a healthy, respectful discussion by focusing comments on the
issues, topics, and facts at hand -- not at other members of the site.


Well now, at least I can read. Jackass.
posted by IronLizard at 3:29 AM on June 3, 2006


Yahoo bot seems especially stupid to not realize the pages it is crawling are valueless, whereas the MSN and Google bots figure this out much sooner, and basically stop crawling.

The cumulative number of pageviews by month keeps going up for all of them, so there isn't any 'sooner' about it. MSN and google increase their attention in fits and starts, where yahoo's attention increases consistently, and you could argue that MSN eventually capped its attention level.

MSN and google both avoid the deep nodes, but can we call that spam avoiding behaviour (or link farm avoiding, whatever..)?
posted by Chuckles at 4:32 AM on June 3, 2006


I can't assimilate all the information but it sure draws pretty pictures. I like the Yahoo Slurp Tree most.
posted by tellurian at 4:40 AM on June 3, 2006


IronLizard, I have the feeling that elpapacito's tongue was in his cheek. In any case at least the diagrams of trees are sort of purty, can't you just enjoy those?
posted by econous at 4:42 AM on June 3, 2006


I guess I am misreading that (on second review, in more ways than one. ARGH!). For one, the graphs aren't attention per month, they are cumulative.. And I am mixing up which ones have the 'fits and starts' behaviour..

So, MSN really does stop paying attention to the site, but you can't say the same about google. google only goes so deep, but it keeps coming back to view pages.
posted by Chuckles at 4:43 AM on June 3, 2006


"PLEASE GOD, WHAT DOES IT ALL MEAN?!?!?!"

"I SEE THEM EVERYWHERE, EVERYWHERE. YELLOW, BLACK AND, ERR, RECTANGULAR. EVERYWHERE! EVERYWHERE! DO YOU HEAR ME?"

"THERE, THERE..."

"EVERYWHERE, EVERYWHERE. YELLOW, BLACK, AND, ERR, RECTANGULAR! WITH... WEDGE SHAPES INSIDE."

"THERE THERE, JUST LIE BACK ON THE COUCH, MRS. ERR... RECTANGULAR."
posted by loquacious at 5:03 AM on June 3, 2006


It's a pretty clever experimental design.
posted by GuyZero at 5:13 AM on June 3, 2006


On 2005-06-30 Googlebot visited node 1, the leftmost node. It did not crawl the path from the root to this node, so how did it find the page? Did it guess the URL or did it follow some external link?
I think the Googlebot is haunted.
posted by sswiller at 5:57 AM on June 3, 2006


it would be illuminating to see the experiment with more, shall we say, interesting valueless content. maybe grab some unique gutenberg-ish text, markov it up a bit, and place something more distinguishing on each page to see if search engine penetration isn't more thorough than it appears, or at least more so than spam bots. if this is an accurate picture of search engine penetation, it sure explains why search results for things i know are around turn up more build logs than anything else. google is reported to be very shallow and this experiment seems to confirm it, but with questionable seeding.

better seeding would probably not be in keeping with the spirit of the zero content symposium, though.
posted by 3.2.3 at 6:41 AM on June 3, 2006


"google is reported to be very shallow"
well really, how often is a page only accessible from a page with is only accessible from a page which is only ... past 12 levels? that just seems like bad organization design.
posted by Tryptophan-5ht at 7:03 AM on June 3, 2006


a) apparently, it's a lot more shallow that 12 levels. from the looks of this experiment, it may even be arbitrary.

b) people performing searches are hardly in a position to rectify the organizational design of the results they are trying to find, especially prior to finding the results.
posted by 3.2.3 at 9:21 PM on June 3, 2006


« Older There are approximately 77.6 million owned cats in...  |  How to write an article about ... Newer »


This thread has been archived and is closed to new comments