Join 3,433 readers in helping fund MetaFilter (Hide)


grub - distibuted search engine
April 21, 2003 5:44 AM   Subscribe

Grub: The seti@home of search engines?
According to the New Scientist: "A distributed computing project called Grub, which harnesses individual users' spare computing power and internet bandwidth, began cataloguing millions of web pages this week."
Grub has thus launched before HyperBee, a similar distributed search project.
This link was previously posted on MeFi when it was still in the conceptual stage.
The project is being run by LookSmart (along with its own open directory project called zeal) but as the New Scientist article notes: "Website information collected by Grub is already being fed into one of LookSmart's search services, called WiseNut. But the collected data are also freely accessible to the public, so they can be incorporated into any web site or desktop application."
Possible Google competition or doomed from the start?
posted by talos (10 comments total)

 
I'm into it! I have it running on my machine without problems, as do maybe 4,000 other users, according to the control panel. Neat screensaver shows you what the thing is doing while you're brushing your teeth. And since Google isn't really primarily about public search anymore, or so some folks speculate, this makes for a good open back-up.
posted by hairyeyeball at 6:25 AM on April 21, 2003


I've played with it for a bit this morning. The method used to index the internet really doesn't matter to the end user, it's the quality of indexing that matters. What might matter is the amount of coordination between the various distributed spiders. If you're hit by a lot of grub clients indexing your site then a lot of your bandwidth goes to helping out grub but is taken away (or you're faced with larger bills) from your end users.

For what its worth the search seems to work alright, though it is noticably slower than google.com. The entries on my various web sites are out of date by several months as well.

I don't really search the web in the same way that I did originally, so for this to even approach google's usability for me it needs a lot of improvements.

For instance I use it to search for technical information and some of google's search modifiers are terribly useful for that.

Search all *.edu sites for PDF files containing information on Adiabatic CMOS

Man, I love that feature. It's a whole lot better than most sites built in search engines.
posted by substrate at 6:31 AM on April 21, 2003


Does anyone else not notice the fact that LookSmart, Ltd. is a marketing firm? Why would I allow another company to make money off my machine by using targeted marketing in a distributed environment?

A quote from the company's site:
"LookSmart is a leader in Search Targeted Marketing. Through its innovative LookListingsTM suite of commercial search listings products and graphical advertising products, LookSmart enables large and small businesses alike to expose their products and services to customers at the precise moment they're searching for that very thing. The result is a better search experience for the user, as well as highly qualified leads and lower customer acquisition costs for the business. The LookSmart network reaches 77%* of Internet users, and includes Microsoft's MSN, Excite@Home, AltaVista, Netscape Netcenter, Inktomi, Prodigy, Juno, CNN.com, Road Runner, Cox Interactive Media, InfoSpace (Go2Net, Dogpile, MetaCrawler) and Ask Jeeves."

Or how about:
"LookSmart is committed to editorial integrity and does not accept porn, hate or spam in its directories."

I don't want hits organized according to whether or not they've been paid for, and I don't want censored material; that's what this company does.
posted by FormlessOne at 7:09 AM on April 21, 2003


WiseNut seems to charge you to submit a site, per click.
posted by rushmc at 8:04 AM on April 21, 2003


Hmmm...I see AskJeeves charges now too. I guess this is a new standard practice of which I was unaware. Never mind.
posted by rushmc at 8:54 AM on April 21, 2003


Yes, rush, it pretty much is standard practice now. There are only about three ways to make money with search engines: advertising, charging for submissions, and licensing results. Sometimes one or two of these are semi-combined such as in pay-per-click targeted results. It's never quite been clear what the advantage of any of these might be to end-users, but some of them have clear advantages for commercial exposure of websites. Google seems to have a lock on end-user mindshare, of course, so most of the competition is happening in other areas.

As for LookSmart, it may be a "marketing" company, but it's hard to argue with the assertion that all search engines are marketing companies. And LookSmart's directory has been around longer than Google's, and I don't believe that there is any directory as such -- Yahoo, DMOZ -- that permits porn, hate, or spam. (Spidering search engines are another matter entirely.) Even Google delivers targeted "results" on a pay-per-click basis not only to its pages but to the right side of this very page you're reading.

The advantage of Grub, perhaps, is that Google is probably running up against some basic technical limitations on how many pages it can reliably and repeatedly spider, even with massive Linux server farms. (Every once in a while somebody posts to MeTa about how they got different results on the same search within N hours of each other. This is just falling between Google's technical cracks as the farm updates.) Grub might be able to improve on that performance, thus indexing more pages more recently and achieving greater search accuracy thereby. For sure, though, even if it's borrowing computing resources, Grub can't run for free, and LookSmart/Zeal seem to have managed to lock up a swathe of second-tier partners for licensing their results.
posted by dhartung at 9:51 AM on April 21, 2003


From an end-user perspective, the criterium most likely used for evaluating a search engine is the engine's precision. Google has incredibly high precision--you will almost always find a relevant document within the top ten ranked results. From a document base of 3+ billion, that approaches insane levels of precision.

It's not the number of documents spidered, or even--for the most part--the currency of the documents. Google gets around quickly enough that I don't see distributed spidering being all that more timely. As they cannot really improve on Google's precision, what will Grub offer that Google does not? A fancy screen saver? (though that may be enough...)

~~[[[8^)
posted by Fezboy! at 10:36 AM on April 21, 2003


My point is that, while Google and other search engines also offer pay-per-click targeted marketing, they also spider using their own hardware - in other words, they use part of the money they receive from their for-profit business to power their spidering.
Grub, just as its name implies, is not only performing pay-per-click targeted marketing, they're also grubbing for resources by convincing users to also pay for the cost of spidering by offering their machines to offset their costs.
Therefore, more money flows into the LookSmart coffers (because they're paying less for hardware and charging the same for targeted marketing) than the other search engines, at the expense of exploiting its users.
What I want to know is, why is it that these distributed computing efforts always involve people volunteering their machines for the profit/benefit of another company? If it's for a non-profit effort, that's one thing - but LookSmart and other companies like them are for-profit!

You want to use my machine? Pay me.
posted by FormlessOne at 12:22 PM on April 21, 2003


Yes, rush, it pretty much is standard practice now. There are only about three ways to make money with search engines: advertising, charging for submissions, and licensing results.

I guess it just seems counterintuitive that a "search engine" would erect any barriers to improving their product and its results (and therefore, its value).
posted by rushmc at 12:56 PM on April 21, 2003


If this little experiment succeeds, there is only a matter of time before a non-profit, open-source alternative pops up. Then it's buh-bye Grub, marketing and generally making money out of search engines.

Is this a good thing? Discuss.
posted by spazzm at 6:53 PM on April 21, 2003


« Older Planetarium....  |  The Republicans seem to be tur... Newer »


This thread has been archived and is closed to new comments