To be clear, before the test began, these queries found either nothing or a few poor quality results on Google or Bing. Then Google made a manual change, so that a specific page would appear at the top of these searches, even though the site had nothing to do with the search. Two weeks after that, some of these pages began to appear on Bing for these searches.This is pretty damning IMO.
Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation. ... We look forward to competing with genuinely new search algorithms out there—algorithms built on core innovation, and not on recycled search results from a competitor.posted by Nelson at 3:33 PM on February 1, 2011 [1 favorite]
Here's something I bet you didn't know: If you have Internet Explorer on your computer, it keeps a record of every website you visit, and every file you open, even if you never use IE. There is supposed to be a way to disable this by turning off Autocomplete, but the two places I've found in IE that supposedly do that had no effect on the tracking. So go ahead and laugh about senior citizens. If MS decides to have its spy on your computer tell all, it may not seem so funny.uh... citation?
how many searches/clicks going through those internal google-located IE browsers (with clickstream enabled); had to have been made by google for MS' algorithm to pick them up.Realistically, figuring out that is a task for 4chan.
Will, an interesting link came up on Hacker News tonight:And as I said myself, it would be more justifiable if Bing was actually learning from this data, designing improved correction algorithms based on ideas gleaned from Google's work. But it seems more like they peeked at the search URLs and said "Google returned site X for query Y, so we will too" without understanding how or why that correction was made. It's not improvement, it's piggy-backing on Google's language parsing work.
http://news.ycombinator.com/item?id=2168332
This Microsoft paper seems to confirm that rather than just doing generalized learning on clickstream data, Microsoft has deliberately reverse engineered url parameters on Google to learn when Google is doing a spell correction. The paper seems to say that Microsoft was looking for specific Google url parameters like “&spell=1″ that indicate a spell correction. Here’s how the paper puts it:"In our experiments, we "reverse-engineer" the parameters from the URLs of these sessions, and deduce how each search engine encodes both a query and the fact that a user arrived at a URL by clicking on the spelling suggestion of the query – an important indication that the spelling suggestion is desired. From these three months of query reformulation sessions, we extracted about 3 million query-correction pairs."Together with the experiment we recently ran, can you see where engineers at Google would be concerned? And would want more clarity from Microsoft about how clicks on Google and Google urls are used at Microsoft?
It doesn’t look like Bing ever understood how or why Google returned the corrections it did, or developed ways to make similarly advanced logical leaps based on that data. It was more a matter of “Google returned site X for query Y, so we will too.” That’s not competition, that’s simple mimicry.If Bing collects clickstream data that allows it to make associations between components of a URL and links clicked on the page, the point is probably not for Bing engineers to examine the data and figure out why, when the parameter q=still+safe+to+eat is present in the URL, the link http://ask.metafilter.com/54402/Still-safe-to-eat is often clicked on. The point is to collect so much data that a useful number of such associations can be made and the machine machine learning algorithm can generalize a little so that if it sees q=safe+to+eat it may still say "well, we'll count that as a partial (0.75) votes for the result http://ask.metafilter.com/54402/Still-safe-to-eat."
If Google had set up these honeypots out of sheer paranoia, then it would be natural for Bing to pick up on some of them since there were literally no other signals to use.and
Bing didn’t fall back on Google’s results for the honeypots alone as a last resort when all other search signals failed. Bing borrowed from them on a wide scale, even when they were flawed, to the point that Google noticed and decided to test for it.In my example above, the 0.75 votes for a particular URL gets mixed in with "1000" other votes for other URLs provided by other parts of the system. WillF didn't say that the clickstream data is only used when there is no other information--we can assume it is used for every query. But it's probably true that for lots of queries there are other types of information being taken into account (e.g. "how many times does the term appear on this page?") that are given greater weight, so that the clickstream data has little effect--But not no effect at all.
Is it wrong for Microsoft to use the data provided by users who have opted-in to the terms of the toolbar? Is it wrong for Microsoft to use clickstream data to create associations between form inputs and the next link a user clicks?As lupus_yonderboy asserts, I don't think that there's anything fundamentally wrong about what they're doing. Frankly, I think it's a great idea and the concept of a 'meta-search-engine' that learns from other search engines is one that shows promise, at least IMO.
« Older South Dakota Rep. Hal Wick (R-Sioux Falls), is spo... | It's Ratfist!... Newer »
This thread has been archived and is closed to new comments
posted by zeoslap at 10:16 AM on February 1, 2011 [1 favorite]