I thought Grub was a bootloader?
April 23, 2003 10:32 AM   Subscribe

Is Grub out of control? Barely more than a week old, the distributed search engine is already causing headaches. It does not properly follow the Robot Exclusion Standard and thus spiders sites against their owners' wishes. Because it is a distributed client run by thousands of volunteers (and therefore connects from many different IP addresses), it is non-trivial to block. The Wikipedia project, for example, is experiencing slowdowns because of it. Let's hope they can solve these problems, as the idea seems to be quite cool.
posted by Eloquence (7 comments total)
 
grub asks you to visit this page if you're having trouble with grub visiting your site when it should not. that isn't non-trivial (assuming you know to visit the page), but it's definitely a kludge.

one way to ensure that bugs in grub don't pose a similar threat to websites is to design the program to communicate to a central server for certain directions. perhaps grub could ask a server "what version of myself is the latest," not connecting if the program is not that latest version.
posted by moz at 11:01 AM on April 23, 2003


I had enough of Grub....enough in fact to ban every IP that hits my site running it. It's not a responsible piece of software. If it is to be used as a search bot, then it needs to respect robots.txt.
posted by mkelley at 11:10 AM on April 23, 2003


I still don't understand what Grub's benefit is to the end user. Google I can understand, as I posted in the other article I can use google as a tool to help me screen for information I'm interested.

Grub's data was stale and I don't see that it's search functionality was any better. Really this seems more like a marketing driven design: It MUST be better, it's DISTRIBUTED. It's fully buzzword compliant, but is it a better search engine? I'd vote no.
posted by substrate at 11:20 AM on April 23, 2003


Ouch, kludge indeed... I thought about participating in the project when it was first announced, but not if it's going to rape the web like that. A well behaved bot should not only obey the Robot Exclusion Standard and not crawl an individual site too fast/frequently, but it should also remove URL's that respond with 403 (Forbidden) from it's crawling DB.
posted by rogue at 11:55 AM on April 23, 2003


There is no benefit to the end-user; its benefit is for the marketing firm that owns Grub, Looksmart. This way, they can get all of the benefits of directed marketing as already applied by search engines, without the high cost of maintaining hardware needed for spidering. The justification for many search engines for charging is precisely to offset the costs of the search engine itself - Looksmart is using the distributed computing design pattern to maximize their profits by still charging for advertising (after all, other search engines charge for it), but outsourcing the hardware & software overhead to thousands of volunteers willing to work for nothing.

In other words, you volunteer to help out Looksmart by reducing their costs, while submitting yourself to a search engine that is still charging for advertising and screening results according to the pay-per-click idea. This isn't some grass-roots effort like SETI@home to make you part of something larger - this is simply a corporation who's willing to exploit your resources so they can make a buck, without giving you much of anything in return.

This is a sucker's bet - the search engine equivalent of Bonzi Buddy. I know I can't be the only person who realizes this...
posted by FormlessOne at 2:00 PM on April 23, 2003


BTW, am I also the only one amused by the Looksmart pay-per-click advertisements hooked onto this article?
posted by FormlessOne at 2:02 PM on April 23, 2003


Don't know anything about Grub, but I know a lot about tech support and I am astounded by how dismissive the responses are to what seems like a legitimate question, stated respectfully and backed up by server logs. Do the people who are giving replies on the forum actually represent the Grub project? After refusing to believe that it's not a problem of a recently changed robots.txt, they then go and switch the subject and claim that they don't have to respect robots.txt because automatic downloading isn't really spidering. Is it so difficult to say "Hmm, maybe we do have a problem, give me more info and I'll look into it"?
posted by fuzz at 5:05 PM on April 23, 2003


« Older Ad Parodies   |   Cats in Hats Newer »


This thread has been archived and is closed to new comments