Bing Sting.
February 1, 2011 10:15 AM   Subscribe

Google set up a sting operation to prove that rival Microsoft search engine Bing is cheating, using Internet Explorer to track users' Google search results and mining that data to improve Bing. Here's the proof.
posted by 2bucksplus (160 comments total) 16 users marked this as a favorite
 
This is a problem for users how? Go BING!
posted by zeoslap at 10:16 AM on February 1, 2011 [1 favorite]


So can we call this stealing?
posted by mullingitover at 10:17 AM on February 1, 2011 [2 favorites]


“I’ve got no problem with a competitor developing an innovative algorithm. But copying is not innovation, in my book.”

Careful, that kind of language would implicate your own Android team, as well.
posted by Blazecock Pileon at 10:19 AM on February 1, 2011 [7 favorites]


Dude, I totally found this sweet deal on hiybbprqag over at mbzrxpgjys and jumped all over it like a koala in indoswiftjobinproduction.

Potrzebie!

posted by Dr-Baa at 10:20 AM on February 1, 2011 [7 favorites]


Microsoft responds: "We do not copy Google's results." So, uh, there.
posted by The Winsome Parker Lewis at 10:22 AM on February 1, 2011


IWQAFASE (I Was QA For A Search Engine)
Search engines often run qualitative rounds against other engines for QA purposes. Search engines will also try to "learn" from Google because their customers have been trained up on google and cite it as the "right" result set that they are expecting to see. All I see here is Bing applying qualitative lessons from Google's spell-correction search result method.
posted by L'Estrange Fruit at 10:22 AM on February 1, 2011


So can we call this stealing?

Inasmuch as Carlos Mencia went to jail for stealing jokes...yes.
posted by T.D. Strange at 10:25 AM on February 1, 2011 [4 favorites]


Yeah they are using the bing bar to "learn" what users are actually looking for when they search.

This also effects DDG and any other site that uses bing results.
posted by Ad hominem at 10:27 AM on February 1, 2011


This is a problem for users how? Go BING!

Off the top of my head this is a problem because:
- Microsoft engages in heavy datamining of users, even those using third party platforms like Google;
- Microsoft's engineers are either unwilling or unable to design a viable competitor for Google;
- Internet search just became more of a monopoly;
- The browser most people in the world use ultimately works for Microsoft, not for you as a consumer;
- Search results are not the result of some immutable law or theorem, companies can and do alter results for reasons not necessarily in the consumer's best interest;
posted by 2bucksplus at 10:27 AM on February 1, 2011 [23 favorites]


All I see here is Bing applying qualitative lessons from Google's spell-correction search result method.
To be clear, before the test began, these queries found either nothing or a few poor quality results on Google or Bing. Then Google made a manual change, so that a specific page would appear at the top of these searches, even though the site had nothing to do with the search. Two weeks after that, some of these pages began to appear on Bing for these searches.
This is pretty damning IMO.
posted by muddgirl at 10:28 AM on February 1, 2011 [13 favorites]


Hm.. I concede that the hiybbprqag stuff is weird.

The thing is, to implement "theft" of this nature, they'd have to be using the Google API to suck up enough search result data to "learn" these correlations, and that's technically not viable as Google is very protective of it. Use of the Google API is limited and monitored - in my QA days, we had Google API feeds cut off after a certain number of queries/results quite often (these would be used for the comparatives mentioned above).

Also, what they are supposedly copying are nonsense queries - not a query they themselves would generate by any human or mechanical means. The only way to "learn" about that kind of nonsense query would be by tracking google's query history somehow. This is not impossible - Google themselves display the query feeds in various ways and places.
It gets a bit more grey at this point - *if* a competitor is tapping into Google's query feeds in order to get useful quality data about popular queries, query handling and query results - again because Google is commonly used as a gold standard for query results - and then use this information to improve their own algorithms to be more like Google and therefore the results their user want to see - is it theft or natural marketplace evolution toward a common standard?
posted by L'Estrange Fruit at 10:32 AM on February 1, 2011


Somehow, IE users might have been sending back data of what they were doing on Google to Bing.

Missed that. Sneaky.
posted by L'Estrange Fruit at 10:33 AM on February 1, 2011


I agree that Google's sting is pretty damning, and if it wasn't clear, I think Microsoft's single-sentence denial comes across as pretty weak and unconvincing. Google's honeypot reminds me of how some dictionaries will include fake words to see if their competition is plagiarizing them. (The article linked in that FPP is no longer online, but is archived here.)
posted by The Winsome Parker Lewis at 10:35 AM on February 1, 2011 [4 favorites]


Microsoft responds: "We do not copy Google's results." So, uh, there.

Microsoft responds: "We do not copy Google's results." So, uh, there!


(see, I didn't copy your post - mine has an exclamation point at the end where yours has a period.)
posted by cashman at 10:39 AM on February 1, 2011 [2 favorites]


Imagine you walked into a coffeehouse that just offered "free coffee" on the menu. You look behind the counter and see cans of Folgers coffee sitting there.

"Hey," you say. "You're stealing."
"Nope, we acquired this coffee by perfectly legal means."
"Then you're misrepresenting your product."
"Nope. We didn't say this coffee was a special artisanal roast. We said it was just coffee. And Folgers is coffee."
"Then there's something screwy going on here."
"We're providing you with free coffee, right?"
"Yeah."
"Is it hot? Does it taste good?"
"Yes."
"Am I stopping you from getting the same thing somewhere else?"
"No."
"Then what are you hassling me for?"
"It just feels wrong."
"Whatever. Refill?"
"Yeah."
"While you're here, may I interest you in some cheap plane tickets?"
posted by Cool Papa Bell at 10:41 AM on February 1, 2011 [8 favorites]


It is pretty damning but it probably isn't illegal. I'm more concerned that they're leveraging regular, day-to-day spying on users to steal from Google. That's a whole new level of weird and, if the laws were up to speed, would probably count as some sort of anticompetitive practice.

I'm also confused why Google didn't run WireShark on their test search machines to actually catch the spying in progress. That seems like the obvious thing to do and they make no mention of it.
posted by chairface at 10:41 AM on February 1, 2011 [2 favorites]


Yeah in the article it states this only happened for 7-9 out of 100 tests. I think how this happened is that the testers themselves used ie suggested sites or bing toolbar and inadvertantly sent results back to bing. It is pretty clear bing is not siphoning google results wholesale.
posted by Ad hominem at 10:42 AM on February 1, 2011


Shit like this is why I localhosted all microsoft-related domains long ago. Have to disable it from time to get updates, but it's worth it.
posted by aerotive at 10:42 AM on February 1, 2011 [2 favorites]


I did learn about another search engine in one of those linked articles - blekko - that I found a lot more useful than Google, with all its FREE ESSAYS listings in any remotely academic search. I'm sure others have their favorites. If Bing is doing what Google claims, no wonder Bing=Google. (Yeah, Google Scholar is another alternative...)
posted by kozad at 10:46 AM on February 1, 2011


I'm also confused why Google didn't run WireShark on their test search machines to actually catch the spying in progress

says right in the TOS that "suggested sites" and the bing toolbar send search terms back to bing.
posted by Ad hominem at 10:47 AM on February 1, 2011 [1 favorite]


The Winsome Parker Lewis: Microsoft responds: "We do not copy Google's results." So, uh, there

See, they just Embrace and Extend them.
posted by mkultra at 10:48 AM on February 1, 2011 [2 favorites]


I'm more concerned that they're leveraging regular, day-to-day spying on users to steal from Google. That's a whole new level of weird and, if the laws were up to speed, would probably count as some sort of anticompetitive practice.

That was my thought, once I actually read the whole thing and stopped kneejerking (lalala). Having been in the position of a lowly competitor and knowing how hard it is to get hold of data for comparison and learning (in fair, legal ways), if they're using IE to scrape search results for this purpose they have a huge unique advantage that no one else can ever have.
posted by L'Estrange Fruit at 10:50 AM on February 1, 2011 [1 favorite]


I did learn about another search engine in one of those linked articles - blekko

Blekko is cool but it uses bing for straight up searching. Search for the google honey trap words on blekko and you get the same results as bing (well now this article is the top result)
posted by Ad hominem at 10:55 AM on February 1, 2011 [1 favorite]


This makes me wonder how far any of these companies with huge monopolies on services, which tend to be 'free', with very large user bases, go in scraping, sorting and selling data. I can think of a few things that wouldn't break the privacy barrier per se (no source credits for market trend or zeitgeist results for example).

And this just made me giggle,

See, they just Embrace and Extend them.
posted by infini at 11:14 AM on February 1, 2011


Careful, that kind of language would implicate your own Android team, as well.

Uh, so what exactly did the Android team 'copy'?
posted by SweetJesus at 11:14 AM on February 1, 2011


Cool Papa Bell: "Imagine you walked into a coffeehouse that just offered "free coffee" on the menu. ..."


Imagine you walked into a coffeehouse that offered you a free rechargeable self-warming coffee mug with your coffee.

"Thanks," you say. "Is there a catch?"
"Oh, just that we really want you to enjoy your coffee!! So we might collect your data on your coffee experience! Oh, and it also has a built in GPS unit, and can phone home if you use it at another coffee shop, and it can also chemically analyze the contents and report back to us how our competitors are making their coffee, intercept your bank transactions to see how much their charging, maybe some other stuff too... Because we want you to really enjoy our coffee! Pretty great, huh?"
"Sweet, I love coffee!"
posted by Reverend John at 11:16 AM on February 1, 2011 [19 favorites]


This makes me wonder how far any of these companies with huge monopolies on services, which tend to be 'free', with very large user bases, go in scraping, sorting and selling data.

If you're not paying for the product, you are the product. The notion that Google isn't similarly tracking their users' habits is laughable.
posted by one_bean at 11:17 AM on February 1, 2011 [6 favorites]


To reply to the quick snark at the top of the post, btw, this is absolutely legal and it is absolutely NOT stealing. It's theoretically possible that there's a TOS violation tucked away in the fine print somewhere, but that's not really the issue.

The key point here, IMO at least, is that Microsoft has been making a big deal about the quality of Bing's search results, and playing up the engine's ability to take on Google. While it's likely that they have built their own algorithms and are simply augmenting using the data, it's a real blow to the idea that they went off and developed a serious competitor to Google.

If the information in the article is true, they're deliberately piggybacking on Google's algorithm without owning up to it.
posted by verb at 11:20 AM on February 1, 2011 [1 favorite]


This is interesting and all, but that article is way too fucking long. Bloggers.
posted by mr.marx at 11:20 AM on February 1, 2011


If Sergey & Larry feel like taking "Don't be Evil" more aggressively, they'll start selling placement in Bing search results, while still only selling sidebar placement in their own. lol
posted by jeffburdges at 11:27 AM on February 1, 2011 [7 favorites]


Everything Verb said, plus a lot of what appears to be the case is Google and Bing trying to undercut each other with regards to press events about search that were happening this month.
posted by jscott at 11:31 AM on February 1, 2011


I'm really on the fence about where this falls in the brilliant/sleazy spectrum. It's possible that the Bing software doesn't target Google at all. You watch the user type a search term, then see what link they land on, and then learn an association of search -> link from it. The fact it was a Google search is secondary (albeit common). Learning from user behaviour is generally a good way to improve a search engine, and Google does similar things. Clearly inserting the search result wholesale was a mistake, but the general approach is not bad.

What worries me about this story is it compromises Microsoft's position as an independent search engine. Blekko aside, there are only two major English search engines left: Google and Bing. If Bing is becoming more like Google, its value as a competitor goes down.
posted by Nelson at 11:31 AM on February 1, 2011 [1 favorite]


New "screwing with MS" code is hopefully on the way to poison this well by sending strange junk to the toolbar.
posted by a robot made out of meat at 11:35 AM on February 1, 2011 [2 favorites]


This is a marketing failure more than a technical failure. For every ad that Microsoft runs comparing Bing to google, the obvious counterpoint is "why not just search google, since that's where Microsoft is lifting the results from anyway".

It's also a marketing failure to reveal to the world that the TOS is essentially authorizing Microsoft to install spyware within their browser. Yes, you've agreed to it, but it doesn't mean Microsoft wants consumers to think through all that it entails. Since Microsoft is not only scraping your keywords, but pretty obviously scraping your results (or the tests wouldn't have worked), then there's a whole level of intrusion- and even if a user has clicked "I agree" to the shrinkwrap license, it doesn't mean that the marketing folks want you to understand things that are implied in their TOS.

This "scandal" makes it explicit that Microsoft is willing to spy on users- not to improve their product, but to leverage their monopoly in desktop OS to gain unfair advantage over a competitor. It's not far off from what got them in trouble with Netscape.
posted by jenkinsEar at 11:38 AM on February 1, 2011 [3 favorites]


Uh, so what exactly did the Android team 'copy'?

There's the Java source code that was copied over and relicensed, which is part of the ongoing litigation between Oracle and Google. But Android was initially going to be a feature phone OS at the tail end of 2007, and then was retooled to a haptic, iOS-like interface in September 2008, a little more than one year after the successful release of the first-generation iPhone in June 2007.

I think it is common knowledge that a lot of technology companies copy and borrow good ideas. For example, Apple made Xerox PARC's mouse a part of everyone's desktop computing experience. They then started moving to touch interfaces, and now all the others are starting to follow with their own touch devices. So it's a bit silly for a Google representative to claim with a straight face that copying is wrong, when they do it as much as other companies, if not more so.

The two real issues from this, it seems to me, is that Google is nervous about its only real, remaining competitor and trying to stir up fake outrage, and, perhaps more important, that Microsoft is leveraging its still-sizable web browser market share to do things that its userbase probably didn't expect — or would necessarily want, from a privacy standpoint.
posted by Blazecock Pileon at 11:39 AM on February 1, 2011 [2 favorites]


Uh, so what exactly did the Android team 'copy'?

Why would you fall for this?
posted by yerfatma at 11:44 AM on February 1, 2011 [15 favorites]


This makes me wonder how far any of these companies with huge monopolies on services, which tend to be 'free', with very large user bases, go in scraping, sorting and selling data.

If you're not paying for the product, you are the product. The notion that Google isn't similarly tracking their users' habits is laughable.


Habits is one thing, deriving patterns of content by analyzing keywords used for emails within gmail is another.

Not to mention that I find it creepy when gmail asks me if I've forgotten to attach something because it read my words "you wrote attached" - go away and let me make my own errors.

Is yahoo mail an alternative? Not really tbh from the bandwidth deprived regions of the world.
posted by infini at 11:44 AM on February 1, 2011


Habits is one thing, deriving patterns of content by analyzing keywords used for advertising next to emails within gmail is another.

ftfm
posted by infini at 11:45 AM on February 1, 2011


Suddenly, Chrome makes more sense.
posted by esprit de l'escalier at 11:49 AM on February 1, 2011


Not to mention that I find it creepy when gmail asks me if I've forgotten to attach something because it read my words "you wrote attached" - go away and let me make my own errors.

I liked that scheme so much, I set up Thunderbird to do the same thing. No more embarrasing, "Whoops! See the attachement on this email" blunders.
posted by muddgirl at 11:52 AM on February 1, 2011 [4 favorites]


I hate to say it because I don't use or particularly like Facebook, but watch for Facebook. As more and more content is added there, both individual and corporate, it will be as much as a relied upon portal for searching as Google.

Maybe once the Facebook search function isn't a piece of shit.
posted by Pope Guilty at 11:54 AM on February 1, 2011 [4 favorites]


"Don't be Evil-Be Fair"

get em Larry.
posted by clavdivs at 11:56 AM on February 1, 2011


Ugh. Why can't Microsoft just die. I mean, Apple and Google are far from perfect, but at least they are born out of aesthetics and ethics. Microsoft have neither. Their business practices have always, it seems, favoured aggressively dominating markets over giving the consumers decent product. No one should be surprised they do stuff like this. I'd be in favour of competition to Google in the search business, but not from Microsoft. No way. Please let them fade out and go away already.
New "screwing with MS" code is hopefully on the way to poison this well by sending strange junk to the toolbar.
posted by a robot made out of meat


On preview, this is a great idea.
posted by iotic at 11:57 AM on February 1, 2011 [1 favorite]


There are kids getting their degrees in Social Media Administration now(or MIS or whatever) that are probably trying to conjoin the two. Be the product!

Where's the opt-out radio button on the interwebz? and get off my lawn while you stick it on
posted by infini at 11:57 AM on February 1, 2011


I do that too, muddgirl. Very useful Thunderbird add-on (here if anyone's interested).
posted by The Winsome Parker Lewis at 11:58 AM on February 1, 2011


Pope Guilty: Maybe once the Facebook search function isn't a piece of shit.

You know they use Bing once you get past the Social Graph, right?
(Which I guess means they're actually using Google)
posted by mkultra at 12:02 PM on February 1, 2011


I feel like I should get riled up about this, but I just can't muster the energy. All it really does for me is make Bing seems even less relevant.

The only use I find in them is alternative aerial photos if Google Maps isn't showing me something at a good time of year.

That's about it.
posted by quin at 12:03 PM on February 1, 2011


New "screwing with MS" code is hopefully on the way to poison this well by sending strange junk to the toolbar.
Even better - SEO folks selling Bing search results by way of a toolbar botnet.
posted by NMcCoy at 12:09 PM on February 1, 2011 [2 favorites]


The consensus on hacker news (that hotbed of MS fandom) is that bing toolbar establishes relationships between any page and links clicked on that page. It sees a new word with no relationships and ranks it fairly highly. It is most likely not targeting google, but establishing a connecion between a seemingly random word on the search result page and the links on that page.

I'm not going to talk down Google or Apple but that they are "born out of ethics and aethetics" is a fantasy. And from a practical standpoint you can't run a company on 3000 people with awesome MacBooks.
posted by Ad hominem at 12:13 PM on February 1, 2011 [1 favorite]


Habits is one thing, deriving patterns of content by analyzing keywords used for advertising next to emails within gmail is another.

Again, if you think this is all that Google does to track your online activity, you are absolutely nuts. It's right in their privacy policy.
posted by one_bean at 12:28 PM on February 1, 2011


Not to mention that I find it creepy when gmail asks me if I've forgotten to attach something because it read my words "you wrote attached" - go away and let me make my own errors.

What's creepy about that? Seems in the same category as highlighting misspelled words with red underlines.
posted by kafziel at 12:32 PM on February 1, 2011


free rechargeable self-warming coffee mug with your coffee.

THAT STILL CAN'T RUN CSS 2.1.


FUCK.
posted by thsmchnekllsfascists at 12:33 PM on February 1, 2011 [4 favorites]


RUN should be RENDER
posted by thsmchnekllsfascists at 12:35 PM on February 1, 2011 [1 favorite]


using Internet Explorer to track users' Google search results

So they targeted like what, 60 or 70 senior citizens?
posted by sourwookie at 12:35 PM on February 1, 2011 [3 favorites]


If you can't beat em, cheat em.
posted by Catblack at 12:35 PM on February 1, 2011


Microsoft is the only company who can afford to lose billions with their online operations and not even notice it. They have literally nothing (more) to lose with Bing, so I can't understand why they just don't go all out in developing something entirely different from Google. I mean, they're already losing, would a few billion more really matter all that much? They used to babble about "Freedom to Innovate", well why don't they use that freedom?
posted by tommasz at 12:44 PM on February 1, 2011


using Internet Explorer to track users' Google search results

So they targeted like what, 60 or 70 senior citizens?


Here's something I bet you didn't know: If you have Internet Explorer on your computer, it keeps a record of every website you visit, and every file you open, even if you never use IE. There is supposed to be a way to disable this by turning off Autocomplete, but the two places I've found in IE that supposedly do that had no effect on the tracking. So go ahead and laugh about senior citizens. If MS decides to have its spy on your computer tell all, it may not seem so funny.
posted by Kirth Gerson at 12:47 PM on February 1, 2011 [4 favorites]


Microsoft is the only company who can afford to lose billions with their online operations and not even notice it

Well they've been using that strategy for a while now and they just keep spending those shekels.
posted by device55 at 12:51 PM on February 1, 2011


There's the Java source code that was copied over and relicensed, which is part of the ongoing litigation between Oracle and Google.

This is FUD. There isn't any proof that copyrighted Java source code is to be found anywhere in the Android stack deployed to hardware. I assume the 'code' you're speaking of is that which was 'discovered' by Florian Mueller, which has since been debunked. A larger portion of the lawsuit has to do with Google producing a JVM (Dalvik) that isn't compatible across the board with Oracle's standard java library, and thus is a violation of the OpenJDK license agreement. There is a world of difference between 'copying source code' and 'patent infringement', and I'm sure you know that.

But Android was initially going to be a feature phone OS at the tail end of 2007, and then was retooled to a haptic, iOS-like interface in September 2008, a little more than one year after the successful release of the first-generation iPhone in June 2007.

I think it is common knowledge that a lot of technology companies copy and borrow good ideas. For example, Apple made Xerox PARC's mouse a part of everyone's desktop computing experience. They then started moving to touch interfaces, and now all the others are starting to follow with their own touch devices.

Firstly, what evidence do you have that Android was going to be a 'feature phone OS at the tail end of 2007, and was then retooled to a haptic, iOS-like interface in September 2008'? There was a preview release of the Android SDK in November 2007, and the .9 SDK was released in August 2008. What exactly was 're-tooled' in that time frame? It's my feeling that this is your opinion (which is fine) but you don't have any evidence to prove your assertion.

Secondly, Apple does not have a monopoly on the concept of touch sensitive or gesture based interfaces, as much as you would like the believe that to be. The first patent for handwriting recognition using a stylus was granted in 1915, and the first 'Multi-touch' interface came out of the University of Toronto in 1982. So Apple, like all good technology companies, are building upon years and years collective research and development. This isn't 'copying', it's innovation.

I don't fault Apple for being the first to market with a successful mobile gesture-based interface, they were very successful in that. But for years it's been obvious that the next leap in computing was going to be a) network-connected mobile devices, and b) touch screen interfaces. To say that everyone else is just 'copying Apple' is to dismiss all the 'copying' Apple did from earlier field research.

The two real issues from this, it seems to me, is that Google is nervous about its only real, remaining competitor and trying to stir up fake outrage, and, perhaps more important, that Microsoft is leveraging its still-sizable web browser market share to do things that its userbase probably didn't expect — or would necessarily want, from a privacy standpoint.

It shows to me Microsoft's algorithm is still years behind Google's, and they have to resort to legal, yet morally-questionable actions from unsuspecting users in-order to prop up their search results. Time will tell if this is a legal violation of Google's TOS, but it's certainly akin to 'cheating on a test'.

Why would you fall for this?

Like a moth to the flame, I can't resist a good tech-related troll.
posted by SweetJesus at 12:58 PM on February 1, 2011 [7 favorites]


well why don't they use that freedom?

Because copying the people who've done a better job is easier than starting from scratch.
posted by Blazecock Pileon at 12:59 PM on February 1, 2011


And to think, people were worried that Microsoft would change its ways after Bill Gates left.
posted by i_have_a_computer at 1:00 PM on February 1, 2011


Google actively support open-source software. Microsoft actively oppose it. There is a difference in ethos.
posted by iotic at 1:02 PM on February 1, 2011 [2 favorites]


Like a moth to the flame, I can't resist a good tech-related troll.

Oh good, now BP is going to respond paragraph by paragraph, and the rest of the comments are going to be about the fucking Android vs. iPhone bullshit.
posted by smackfu at 1:02 PM on February 1, 2011 [1 favorite]




Thanks, Keith, I didn't know that and I find it frightening. I haven't used windows in ten years and that is another reason right there I don't intend to go back.

Oh and my "senior citizen" joke? Of course I know better. I was intended as nothing more than late-night talk show type humor.
posted by sourwookie at 1:05 PM on February 1, 2011


Google actively support open-source software.

Google actively piggybacks on OS, because it helps them get more ad revenue without exposing them (too much) to litigation. Microsoft doesn't, because its business model hinges around people actually paying for software, not ads.
posted by Skeptic at 1:12 PM on February 1, 2011


well why don't they use that freedom?

Because copying the people who've done a better job is easier than starting from scratch.


Also, as I mentioned earlier, Google is used as a gold standard whether we like it or not. Nothing more frustrating than working like hell on your algorithms, getting your results to pretty-good - and having a customer say, "well yeah, but Google gives me these pages..."
posted by L'Estrange Fruit at 1:12 PM on February 1, 2011


Here's something I bet you didn't know: If you have Internet Explorer on your computer, it keeps a record of every website you visit

I think you are talking about the IAutoComplete shell interface. It is available to all edit controls. The data is stored encrypted in some obscure place, I guess they could have it phone home.
posted by Ad hominem at 1:15 PM on February 1, 2011


If you have Internet Explorer on your computer, it keeps a record of every website you visit, and every file you open, even if you never use IE.

Cite?
posted by kmz at 1:16 PM on February 1, 2011


Here's something I bet you didn't know: If you have Internet Explorer on your computer, it keeps a record of every website you visit, and every file you open, even if you never use IE.

Absolutely true. If you surf with Firefox, and then run CCleaner at the end of your session, you'll see a ton of IE cache files to be deleted.
posted by Benny Andajetz at 1:17 PM on February 1, 2011


Because copying the people who've done a better job is easier than starting from scratch.

So is it your opinion that Apple 'copied' BSD? I mean, they obviously didn't build their kernel from scratch, did they?
posted by SweetJesus at 1:20 PM on February 1, 2011


So is it your opinion that Apple 'copied' BSD? I mean, they obviously didn't build their kernel from scratch, did they?

If I remember correctly, they acquired the Mach kernel from their NeXT acquisition, as well as using parts of BSD. Didn't some of Mach end up making it into BSD?
posted by Blazecock Pileon at 1:31 PM on February 1, 2011


With our fast-paced Brave New World, do they have therapists specializing in folks whose self-worth is inherently linked to the public perception of a corporation that cares nothing about them?

It must be a fucking frustrating job.
posted by yerfatma at 1:37 PM on February 1, 2011 [8 favorites]


Mach is from CMU. NEXTSTEP and subsequently Darwin were "based" on it.
posted by Ad hominem at 1:37 PM on February 1, 2011


I hate to say it because I don't use or particularly like Facebook, but watch for Facebook. As more and more content is added there, both individual and corporate, it will be as much as a relied upon portal for searching as Google.

I already use Twitter for this. I don't even tweet, but when I hear a new (to me) coinage of a term or a new name of a band, celebrity, product, whatever, or I'm curious about something in the news, I'll search Twitter to figure out what it is -- people link to stuff on there all the time.
posted by bluefly at 1:37 PM on February 1, 2011


If I remember correctly, they acquired the Mach kernel from their NeXT acquisition, as well as using parts of BSD. Didn't some of Mach end up making it into BSD?

MACH is the microkernel, but the microkernel isn't the whole kernel. Much of Darwin is derived from BSD and GNU sources.
posted by SweetJesus at 1:38 PM on February 1, 2011


If this is what the debunker says, Google seems to have relicensed and used Sun's source code — even if it was by mistake or if it didn't end up in a compiled build.

What the debunker is talking about is the uploading of MMAPI (MultiMedia API for J2ME) as part of a regression testing suite to Google's source code repository. Since Android doesn't run J2ME applications, and the code was related to unit testing for a particular type of Sonos device, I think it's pretty safe to say this code wasn't included in any deployed version of Android. Some third-party's mistaken upload is not Oracle's smoking gun, and still doesn't prove your assertion that Google 'copied Java's source code'.
posted by SweetJesus at 1:56 PM on February 1, 2011


Mod note: Apple vs Google Same Old Argument needs to go to metatalk or email from this point forward, thanks.
posted by jessamyn (staff) at 2:05 PM on February 1, 2011 [9 favorites]


I remember an interview with Vanilla Ice just after Ice Ice Baby came out. The interviewer asked him about stealing the opening riff from Queen's Under Pressure. His response was this:
"No, their song goes:
Da da da da da da da. Da da da da da da da.
Ours goes:
Da da da da da da da. Da-da da da da da da da da."

Microsoft's answer is kind of like that.
posted by 8dot3 at 2:11 PM on February 1, 2011 [5 favorites]


8dot3 - the 'Dude and I quote that interview to each other all the time. Such a classic.
posted by muddgirl at 2:18 PM on February 1, 2011


So I've read a bit about this and I've been wondering whether instead of Google's explanation, it could have all gone down like this.

First, the setup, just as Google described:
* Google publishes pages on the web ("honeypots") with no links to them
* They hack their search results to return those honeypot pages in response to certain nonsense queries that are essentially never performed until this sting.
* Googlers go home, use IE with the Bing toolbar and suggestions turned on, run google searches for the nonsense queries, and click on the results.

Now here's what Google says happened:
* IE or the toolbar phoned home to Microsoft and reported those pages appearing in the Google search results for the nonsense query
* Microsoft used that information as a signal for Bing to incorporate information about Google's results into their search engine

But there's another explanation:
* IE or the toolbar phoned home to Microsoft and reported that a user loaded the honeypot page with a referrer containing the search string. If you don't know how HTTP works, you might not know that a referrer header is sent by most web browsers to a website whenever you click a link. That header includes any search terms that were part of the URL. When the Google testers followed the link from the search results page to the honeypot, Microsoft also got the search terms as part of the referrer data.
* Microsoft associated the nonsense search terms with the honeypot links based on the referrer, rather than the Google search results.

The distinction here is that the "copying" wouldn't be specific to Google. Rather, the same behavior would result if you replaced "Metafilter search" with "Google search" above. Microsoft would simply be using user-entered query terms as a signal in their search algorithm. There's a big difference between using user behavior and inputs to improve Bing's search results and using Google's search results to improve Bing's search results.

Now it's entirely possible that Google has controlled for this by testing a variant of their sting that doesn't rely on a Google search and that they have determined that Bing only acts in this way based on Google's data, but I don't see any evidence that this is the case. Given Microsoft's unequivocal denial, it seems very likely that they are taking a more general approach to using query strings in this way rather than targeting Google specifically.

(Of course, it's also possible that they aren't targeting Google specifically, but the algorithm also cares about the reputation of the sending site. As such, links from Google results would be well trusted and have more influence in Bing searches, while links from a honeypot site would have little to no influence. One could potentially control for this by repeating the experiment with other high-PageRank non-Google sites with built in search functions, such as nytimes.com. This is left as an exercise to the reader, though YMMV now that Microsoft is on notice about this behavior.)
posted by zachlipton at 3:31 PM on February 1, 2011 [1 favorite]


Google's posted Microsoft’s Bing uses Google search results—and denies it to their blog. It's uncharacteristically strong language for Google:
Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation. ... We look forward to competing with genuinely new search algorithms out there—algorithms built on core innovation, and not on recycled search results from a competitor.
posted by Nelson at 3:33 PM on February 1, 2011 [1 favorite]


Who the hell uses Bing?
posted by His thoughts were red thoughts at 3:47 PM on February 1, 2011 [1 favorite]


Search engine uses effective methods to improve search results. Film at eleven.msnbc.com/video.

Not quite.

Google's search APIs have been out there for a long time for other people to build on top of. Other companies use Google's search engine to power their portals' search functionality, and they differentiate based on features. For example, on search engine awarded iTunes give cards to its users in daily contests -- they were using Google behind the scene but marketed a "make searching fun" experience. Others have worked on wrapping the basic search data in other contextual stuff.

Bing has sold its search on the value of "beyond-just-search-results" data. its early ad campaigns hyped the idea that for things like airfares and travel, it was able to give you useful information like predictions on when cheap fares would be coming down the line. But unlike those other white label products, it didn't use Google's APIs and it set itself apart as something bigger and better than Google's old-and-busted.

If this were simply about "using effective methods to improve search," and it became clear that Google's results were better, they could have used Google's own search API like those other companies. Like Yahoo did for years. They could have even used it as an adjunct to their own algorithm's results if they felt they were better in certain contexts but weaker in others. But openly admitting that they used it would have been an admission that they were building on Google rather than replacing it, so they used the toolbar behavioral data instead.

Again, this isn't illegal, it isn't "immoral," but it is pretty damning when it comes to Microsoft's own message about Bing's superiority. It's not a promising search engine; it's a search engine that's good at some things and powered by Google for other things. Unless they're open and honest about what those things are, it's hard to take their claims seriously.
posted by verb at 3:48 PM on February 1, 2011 [1 favorite]


Also? You haven't lived until you've had meetings with MS reps who awkwardly try to work "Bing" into every conversation as a verb.

"Yeah, I think we're about finished here. You guys want to grab lunch?"

"Definitely! There's a good sandwich place a few blocks from here, I'll Bing for it."

"..."

posted by verb at 3:50 PM on February 1, 2011 [4 favorites]


The only time I use Bing is when I need to get more info about moms who wear jeans to match their teen's jeans.
posted by jeremy b at 4:12 PM on February 1, 2011 [1 favorite]


I saw http://duckduckgo.com over on hacker news and found it to be pretty refreshing when searching tech-oriented queries. I don't know who is behind them, but they've got a friendly mix of old school (pre seo spam) google results and sort of a wolfram alpha like summation of the subject.

I think there's room for other players to compete fairly in the search engine market. (Though I remember the day I gave up altavista for google waaaay back when...)
posted by Catblack at 4:44 PM on February 1, 2011 [1 favorite]


I've been using duckduckgo alot since I saw it mentioned on the blue a few weeks ago. I've been very satisfied with it.
posted by Benny Andajetz at 5:08 PM on February 1, 2011


I like ddg and blekko as well, and they both use Bing for "long tail" searches. That is the only skin I have in this game, that as google gains a rep for presenting scraped content before organic content, they start an offensive against bing.
posted by Ad hominem at 5:40 PM on February 1, 2011


Because I believe in healthy market competition (Google is only going to get better if they're not lazy), am wary of throwing all my privacy eggs in one basket and I got fed up with both content farms, and those those @#$^ enraging Experts Exchange pages I set bing as my default engine (in Chrome). I mean if Google is the ONLY game in town (yahoo... hah!), how long is it before they start not acting like a complete monopolist?

Despite this I always have to manually go to google for hard searches because bing is just not as good.

Also as long as MS is throwing away billions, can they PLEASE buy delicious from yahoo... sigh.
posted by stratastar at 6:14 PM on February 1, 2011 [1 favorite]


Also does anyone else love/hate how the whole content farm / false information / real information idea that Stephenson weaved into Anathem is actually coming true.
posted by stratastar at 6:21 PM on February 1, 2011 [1 favorite]


I couldn't write a short response, so I wrote a long one instead:

http://willwhim.wordpress.com/2011/02/02/is-bing-cheating-at-search/

I'm a Bing search engineer, and this is my (non-official) take on it.
posted by willF at 7:15 PM on February 1, 2011 [13 favorites]


>>I did learn about another search engine in one of those linked articles - blekko

>Blekko is cool but it uses bing for straight up searching. Search for the google honey trap words on blekko and you get the same results as bing (well now this article is the top result).

Turbo10. Word to your momma.
posted by uncanny hengeman at 7:19 PM on February 1, 2011


willF,

But how would Bing grab those honeypot search terms from user data? How would enough in-public users search for the random strings that Google setup for the algorithms to grab those pages, unless those searches were done, in-house in google?
posted by stratastar at 8:34 PM on February 1, 2011


stratastar, unless I'm misreading willF's blog post, it sounds like he's saying that Bing uses tons and tons of different inputs to determine where people end up when they search for certain things.

Internally, Google uses this approach to weight its own results. If people ALWAYS click a particular link after searching for a subject, that link can get more weight for that search term even if its raw scoring wouldn't. Google tracks that because they see what links on the google search page people click on. Bing, via the Bing toolbar and its user data collection, could do the same thing: it could look for sites that people search for on Google, and see what links they click on, and weight the destination pages in Bing's own index higher based on the successful search.

At least in theory, for search terms with no good matches (like the gibberish text Google was using), there would be no other good ranking data and any information gathered via this 'what people click on when they search for a phrase' mechanism would become the hottest source of ranking data for a particular gibberish phrase.

Collapsing down a lot of the details from the Google post and correlating it with willF's, I think that zachlipton's reading of the whole mess is probably correct. Google notices that Bing "trails" Google search engine tweaks by a couple of weeks. Google sets up the honeytraps by inserting false search results for a handful of gibberish phrases. Google engineers then search on them and click through to the bad-match pages on machines running IE. By doing so, "user data" is gathered by Bing, associating the gibberish words with the badly matched results pages. Because those gibberish phrases have no good matches on the Internet already, the incoming 'user behavior' data weights high in Bing's algorithm, and after a couple of weeks those gibberish phrases start generating the same matches on Bing that they do on Google.

Now, I'm not sure that's what willF is saying, and I don't want to put words in his mouth. But it sounds like he's describing in abstract terms precisely what Google is claiming Bing does: "It strongly suggests that Bing was copying Google’s results, by watching what some people do at Google via Internet Explorer."

zachlipton also nails the most interesting aspect of the whole affair. if this reading of it is true, Microsoft isn't "spying on Google." They're piggybacking on the same clickstream data that Google does, with the permission of the users doing the searches. They can piggyback on the same clickstream data, in fact, for any search engine that someone with IE uses. It suggests that any search engine trying to beat Google is feeding data to Bing as well. Everyone is. In a way, they are building an algorithm: an algorithm that pits search engines against each other and watches what results get the most clickthroughs.

Again, if this is true it suggest that Google's real anger is that there is no way to avoid simply becoming input for Bing's meta-algorithm. There's no way for anyone to.

And that's when 4chan realizes that they can do exactly what Google did, by conducting the same tests with googlewhack phrases and their own proxies that insert the proper links into Google searches on their own machines.
posted by verb at 9:30 PM on February 1, 2011 [2 favorites]


Here's something I bet you didn't know: If you have Internet Explorer on your computer, it keeps a record of every website you visit, and every file you open, even if you never use IE. There is supposed to be a way to disable this by turning off Autocomplete, but the two places I've found in IE that supposedly do that had no effect on the tracking. So go ahead and laugh about senior citizens. If MS decides to have its spy on your computer tell all, it may not seem so funny.
uh... citation?

Anyway, who exactly cares? I don't know why anyone would want to run the bing toolbar, but if it's clear that the toolbar is using the data to improve search results, who cares? It does make microsoft look like tools, which is I guess what google wanted.
posted by delmoi at 10:19 PM on February 1, 2011


verb, right I get what's going on, but I guess my question is, how many searches/clicks going through those internal google-located IE browsers (with clickstream enabled); had to have been made by google for MS' algorithm to pick them up.
posted by stratastar at 11:25 PM on February 1, 2011


Even if Microsoft returned exactly the same results as Google, they still wouldn't have Google's ad placement algorithms or massively cost-optimized compute infrastructure. Those are the things that let Google rake in the cash, not weird misspelled words. It's not as if Bing is unprofitable because it doesn't have enough users.
posted by ryanrs at 11:30 PM on February 1, 2011


how many searches/clicks going through those internal google-located IE browsers (with clickstream enabled); had to have been made by google for MS' algorithm to pick them up.
Realistically, figuring out that is a task for 4chan.
posted by verb at 11:44 PM on February 1, 2011


It would be interesting to know if Bing tracked responses from any other, independent, search engines in this way. They'd have to code it specifically for each case, presumably, as the search URL would have different GET parameters in each case. In any case, it seems clear that clickstream data using Google must have been specifically targetted, for this to work.
posted by iotic at 11:58 PM on February 1, 2011


Yeah, picking up on what iotic said, assuming that the Bing toolbar essentially acts as an HTTP packet-sniffer that streams data back to microsoft, it seems like they would need to add in Google-specific algorithms into their search-engine code, to separate out Google result pages from the rest of the random clickstream data, parse their URLs to get the query string out, and follow the user to the resulting page, subsequently associating the query term with the page. This seems consistent with WillF's post above, and it also seems like exactly the copying that Google is objecting to.
posted by whir at 12:51 AM on February 2, 2011


assuming that the Bing toolbar essentially acts as an HTTP packet-sniffer that streams data back to microsoft

You sure? I'm pretty sure toolbars can grab whatever they want out of the browser.
posted by Ad hominem at 2:19 AM on February 2, 2011


What I mean to say is that there is a huge difference between the toolbar being a packet sniffer and the toolbar being able to use the documented methods on the IE COM object. Any loaded toolbar including Google's can inspect the currently loaded doc, subscribe to events and do various other things with IE. Installing a packet sniffer is pretty nefarious even for Microsoft.
posted by Ad hominem at 2:42 AM on February 2, 2011


willF: "I couldn't write a short response, so I wrote a long one instead:

http://willwhim.wordpress.com/2011/02/02/is-bing-cheating-at-search/

I'm a Bing search engineer, and this is my (non-official) take on it.
"

Shorter version of my comment awaiting moderation there: Bing's copying would be more forgivable if it only happened with the honeypot sting, which could be justified as an isolated, artificial situation in which Bing had zero "search signals" to go on other than the nonsense queries. But Google set up the sting because they'd noticed that Bing's top results were becoming increasingly similar to Google's, from popular queries to obscure ones to misspellings and even to what they saw as flawed results. This wasn't a fluke that only manifested in a synthetic query designed to catch it; it was a habitual thing significant enough for Google engineers to notice.

Microsoft has a history of aggressively leveraging dominance in one area for unfair advantages elsewhere, from using Windows to make IE the default dominant browser to paying NewsCorp to give search exclusivity to Bing. This is just another sad example, in this case leveraging toolbars and browser features to crib notes from user interactions with Google. It needs to end, not only for the sake of fair play but also to ensure Google has a viable competitor, rather than one that's desperately trying to ape it.
posted by Rhaomi at 4:48 AM on February 2, 2011 [1 favorite]


A few key questions.
  • Did Bing get data from all searches done in IE or just from users who had installed the Bing Toolbar?
  • Does the Google Toolbar / Chrome extract clickstream data in the same way as the Bing Toolbar does?
  • Is there any evidence that Bing was specifically targeting Google searches as a form of clickstream data above any other $keyword -> $site relationships?
My first reaction when reading about this was 'Microsoft did something bad', but the more I read, the more I tend towards 'Google is misrepresenting common practice'.
posted by Busy Old Fool at 5:46 AM on February 2, 2011 [1 favorite]


Here's something I bet you didn't know: If you have Internet Explorer on your computer, it keeps a record of every website you visit, and every file you open, even if you never use IE.

uh... citation?


See for yourself. If you don't use IE, try launching it, then opening its History file. It will list all the websites you've visited and all the files you've opened. If you do use IE, clear the History file, then close IE, open some other files or use a different browser to visit some websites. Then go back and check the IE history. It will list all those files and websites, even ones you visited in Firefox with Private Browsing on.
posted by Kirth Gerson at 8:06 AM on February 2, 2011 [1 favorite]


See for yourself.

I'm currently browsing in Firefox. I just switched to IE (which has actually been running since yesterday morning), opened the History tab (Ctrl+H), and checked the entries for today and yesterday. There's nothing under today (even though I've been browsing around in Firefox) and only work-related stuff under yesterday (I tend to use IE for work stuff, Firefox for news and blogs etc).

<disclaim>I am a Microsoft employee, so I'm probably trying to steal your internets or something.</disclaim>
posted by The Tensor at 9:08 AM on February 2, 2011


I have IE8 and my history only shows sites I visited with IE8. (I don't work for microsoft and I only use IE8 for visiting our poorly-designed intranet which doesn't support Firefox).
posted by muddgirl at 9:29 AM on February 2, 2011


Same as muddgirl. I use IE8 from time to time for testing purposes and its history only shows the sites I have visited with it.
posted by Busy Old Fool at 9:41 AM on February 2, 2011


Yeah, me too, IE8 history is clean. There's plenty of stuff to blame Microsoft for, please don't confuse things with unfounded accusations.

Note to self: when Some Random Guy on the Internet refuses to provide a citation, but instead says "see for yourself", it's generally because they're both full of shit and too lazy to look it up themselves.
posted by Nelson at 10:04 AM on February 2, 2011 [2 favorites]


Before we start making accusations, perhaps Kirth Gerson has some Firefox plug-in that synchronises data between browsers?
posted by muddgirl at 10:11 AM on February 2, 2011 [1 favorite]


Note to self: when Some Random Guy on the Internet refuses to provide a citation, but instead says "see for yourself", it's generally because they're both full of shit and too lazy to look it up themselves.

Or they have something installed/enabled on their system that does cause this effect, and assume it is a default instead of an anomaly. As we say in QA, an anecdote is not data.
posted by L'Estrange Fruit at 10:14 AM on February 2, 2011


I'm inclined to agree with the folks noting that this is a PR home goal for Microsoft, regardless of the legal or ethical implications. There's a market for alternative search engines, mostly from people unfomfortable with Google's data harvesting, I would suspect, but if you're just cribbing Google's results, and doing the same data harvesting, well... what's the point?
posted by rodgerd at 10:22 AM on February 2, 2011


@stratastar The clickstream data is coming from "our customers" (in Harry Shum's terms) who have opted in to having data sent (anonymously) to Microsoft. The opt-in proviso is important. The anonymizing is important. My blog post says more about this--and, again, this is all my personal take on it.
posted by willF at 11:32 AM on February 2, 2011


Google's Matt Cutts (the guy who made the accusation at that search event) posted an interesting comment to that blog post:
Will, an interesting link came up on Hacker News tonight:

http://news.ycombinator.com/item?id=2168332

This Microsoft paper seems to confirm that rather than just doing generalized learning on clickstream data, Microsoft has deliberately reverse engineered url parameters on Google to learn when Google is doing a spell correction. The paper seems to say that Microsoft was looking for specific Google url parameters like “&spell=1″ that indicate a spell correction. Here’s how the paper puts it:
"In our experiments, we "reverse-engineer" the parameters from the URLs of these sessions, and deduce how each search engine encodes both a query and the fact that a user arrived at a URL by clicking on the spelling suggestion of the query – an important indication that the spelling suggestion is desired. From these three months of query reformulation sessions, we extracted about 3 million query-correction pairs."
Together with the experiment we recently ran, can you see where engineers at Google would be concerned? And would want more clarity from Microsoft about how clicks on Google and Google urls are used at Microsoft?
And as I said myself, it would be more justifiable if Bing was actually learning from this data, designing improved correction algorithms based on ideas gleaned from Google's work. But it seems more like they peeked at the search URLs and said "Google returned site X for query Y, so we will too" without understanding how or why that correction was made. It's not improvement, it's piggy-backing on Google's language parsing work.
posted by Rhaomi at 1:12 PM on February 2, 2011 [4 favorites]


@Rhaomi ... machine learning is "learning." It's not about algorithms so much (see Norvig's great posts abou the "unreasonable effectiveness of data" and chapter in Beautiful Data) as what people do, which is the core idea behind using the clickstream data.
posted by willF at 1:33 PM on February 2, 2011 [1 favorite]


Again, the thing that is specific to Google is specifically parsing the Google URLs and picking out the URL parameters which are keywords. This is essential to make the $keyword -> $url association, since from looking at a URL such as http://www.metafilter.com/contribute/search.mefi?site=mefi&q=beans it is not obvious whether $keyword should be "mefi" or "beans". It's not as though the Google parameters are hard to deduce, but they are not universal standards, and they'd need to be accounted for specifically by Microsoft engineers. (Bumping up the level of complexity some, the toolbar could also look for form fields on the submitting page, but there you would run into problems with, for instance, user addresses being misidentified as search parameters.)
posted by whir at 3:35 PM on February 2, 2011 [1 favorite]


Again, the thing that is specific to Google is specifically parsing the Google URLs and picking out the URL parameters which are keywords.

I don't see why URL parsing is necessary. Couldn't the Bing bar watch for clicks on hyperlinks, and when it detects them simply record the value of any text field on the page with a NAME or TITLE of "search"? (Disclaimer: I know nothing about how the Bing bar works.)
posted by The Tensor at 4:04 PM on February 2, 2011


OK, so some of you don't have the IE History experience, but I am not the only one who does. Also, none of the conditions named either here or on that site as possible causes apply to my computer.
posted by Kirth Gerson at 5:56 PM on February 2, 2011


Kirth - that link does not seem to be talking what you are describing. The link is describing the fact that IE8 seems to store some file names in its history - my home computer lists several files in the IE8 history (yes, I was doing taxes)- but it does not talk about webpages viewed with Firefox being listed in the IE8 history. Can you upload a screenshot?
posted by muddgirl at 6:52 PM on February 2, 2011


I don't know who it is
but it probably is hiybbprqag
I asked my friend Google
I asked my friend Bing
They said it was hiybbprqag!
posted by Anything at 12:04 AM on February 3, 2011 [1 favorite]


Kirth - I believe that you are seeing what you are reporting, and I am interested in finding out what is happening in your case, it just doesn't seem to be a standard behavior of IE. Who knows - maybe it's even stealthy IE behavior that only activates itself under certain conditions, or at random so that it's less detectable - but that's purely speculative on my part and I just would want to be certain before I personally took any measures against it.
posted by XMLicious at 2:27 AM on February 3, 2011


> I'm a Bing search engineer, and this is my (non-official) take on it.

I just read this... a shame this thread is stale but MAN, what a lameness that article of yours is!

(Disclaimer: I worked for Google for five years, though not directly in search... however, I am quite skeptical in general and specifically of Google.)

The fact is that this completely and totally different from "using clickstream data" - and to claim that is disingenuous to the max. On your own blog, jordan117 make very clear, polite and cogent arguments about that - arguments that you basically don't answer at all.

This isn't at all stealing like "robbing my wallet" or even stealing as in "downloading a pirate copy of my album" - but this is absolutely stealing as in "stealing my jokes"; it's stealing like "looking over my shoulder during the test."

Trying to reverse engineer your competitor's program is absolutely legit. Being inspired by features your competitor has to make similar - or even identical - features is also totally fine - software engineering wouldn't progress without it.

But taking someone else's search results and presenting them as your own with no other verification or other reason for doing so (as the honeypot totally proved) is embarrassing. You're basically saying that you simply aren't able to do the job yourself.


Let me tell you what it was like inside Google as Microsoft has attempted to mount search. This was years ago, my non-disclosures aren't valid, and I don't perceive this as particularly secret...

Not long after I started, there was a meeting where Schmidt told us that Microsoft had anointed Google as their "prime competitor to beat". He told us that he was confident in us and our products; he also pointed out that this had happened to him twice before, and he'd completely lost the other two times, so no guarantees.

As you recall, MS's first big try at search was a complete flop. By Google's metrics, it started off pretty good - but then it was clear that they'd tweaked their initial to have very good results on the day they launched it for common queries and their quality went down steadily.

Years passed. Then Bing.

Google again took Bing very seriously. Google has "Code Reds" (very rare) and "Code Yellows" (not as rare) for emergency situations - for this, they created a "Code Bing". Again, Bing started off strongly and then flagged somewhat, but kept up decent quality - Google also admired many of the UI features and looked to come up with better ideas to compete with them; also, Bing doesn't use "blended" ads (where the ads appear to be search results) and if I recall correctly, the previous engine did, so that was morally better.

Again, I'm not a Google fanboy. I worked there for five years and I feel that much of my time was wasted there due to being a small cog in huge projects with intrinsic issues that guaranteed that they wouldn't succeed. I stood up in a meeting and asked Larry politely but very clearly why we were persisting with this huge project (a public-facing one you have almost certainly never heard of due to its flopness) when it had been a failure with the public but also an even worse failure as an engineering platform - he said he'd get back to me, someone contacted me to say they were looking at it but nothing happened. Some months my boss politely asked me, out of personal curiosity only, if I were the person who'd written in a public, anonymous survey a desperate message about the same project (basically, "We're all dying out here"). I said, "Anonymous?" and raised my eyebrows and she laughed, "Of course not, what a dumb idea!" (I'm hardly one to beat around the bush!)

As a consolation prize I got to work on a really prime project - but two+ years of this had left me a burnout and I quit within six months - it took me six more months to get my mojo back (though I'm feeling really hot these days :-D). I really believe that they damaged me as a precision tool this way.

But still, Google would disband rather than use Microsoft's results in that way. They take pride in the originality of their work.
posted by lupus_yonderboy at 3:01 AM on February 3, 2011 [3 favorites]


urg, internet troubles in my hotel room made this thread appear inactive.

Sorry, the machine learning defense is also bogus. The series of clicks you make when using a search engine is very, very different from the series of clicks you use to navigate in a regular website, and the size of the honeypot very small. Perhaps MS has some massive breakthrough that has changed machine learning entirely, but a key point in machine learning is that the data you are looking at has to be very regular - all the standard mechanisms work badly if you have two different datasets glued together.

I can't believe that you can blend click records from searches and click records from regular website browsing within the same model effectively - particularly, this would mean that common navigational actions in form-based websites would end up appearing as search results which has to result in completely incorrect results unless there's something I'm missing.

Now, I'm sure you're using machine learning, particularly if you say so - but it is almost certain in my mind that you are using machine learning only on series of clicks that you believe to be searches - in other words, you are running machine learning on search engine results only to clone other search engine's results and incorporate them with yours without actually figuring out how they do it.
posted by lupus_yonderboy at 3:22 AM on February 3, 2011


(I'd also add that if your machine learning technique really applied to sites other than Google, it'd be pretty trivial to game (i.e. secretly influence) Bing's search results in exactly the same way that Google did to generate hits on fake pages.)
posted by lupus_yonderboy at 3:24 AM on February 3, 2011


IE History seems to record Firefox activity if you have a plug-in for Firefox (or Chrome) that uses a Microsoft program to open files, like using Media Player to view .avi videos. So, even if this is not just IE being nosy, it certainly is an annoying breach of privacy. Again, using Private Browsing in Firefox does not block IE from recording at least some Firefox activity.

I'm not convinced that that is the only way that activity in other areas is recorded by IE. I use Registry Mechanic periodically to clean the clutter from my hard drive, and it usually shows that it's deleted hundreds of files from IE. Again, I never use IE, and I don't watch that many AVIs or WMFs, nor do I use other MS programs on that computer. Here is a screen capture of Registry Mechanic cleaning a hard disk on a computer that did not use IE since the last cleaning three days ago. It says 162 items have been removed from IE. I did not look at any AVIs or WMFs, or open any Office docs in that period.

This seems to be something that doesn't happen to everyone, but does happen to a lot of people. The response by MS employees is typically not much help.
posted by Kirth Gerson at 8:34 AM on February 3, 2011


it certainly is an annoying breach of privacy

I guess I'm a bit confused as to how this is a "breach of privacy" - is there any evidence that my browser history is accessible to Microsoft?

If it is a concern about other people using the same machine, then obviously there is no real way to ensure privacy - for all you know someone has installed a stealth logger and is recording your every move, whether or not you are using Private Browsing.
posted by muddgirl at 8:39 AM on February 3, 2011


I don't think this is a breech of privacy. If you install the Bing toolbar, you're explicitly consenting to them using your browsing data. If I recall correctly, the privacy conditions you're agreeing to are fairly reasonable.
posted by lupus_yonderboy at 12:21 PM on February 3, 2011


I don't think Gerson is talking about the Bing toolbar. He's talking about the fact that IE will sometimes display non-html files in its browser history, even if those files were not accessed with IE.
posted by muddgirl at 12:29 PM on February 3, 2011


> He's talking about the fact that IE will sometimes display non-html files in its browser history, even if those files were not accessed with IE.

I read that, but couldn't see how that was a privacy issue, unless there's evidence that IE is sending this to MS without your consent...?
posted by lupus_yonderboy at 1:23 PM on February 3, 2011


Response from MS:

Setting the record straight

I was unable to attend the Farsight conference yesterday but watched events unfold online and wanted to take a moment to share some thoughts and make sure everyone is clear about a few things.

It was interesting to watch the level of protest and feigned outrage from Google. One wonders what brought them to a place where they would level these kinds of accusations.

Before we explore that, let me clear up a few things once and for all.

We do not copy results from any of our competitors. Period. Full stop. We have some of the best minds in the world at work on search quality and relevance, and for a competitor to accuse any one of these people of such activity is just insulting.

We do look at anonymous click stream data as one of more than a thousand inputs into our ranking algorithm. We learn from our customers as they traverse the web, a common practice in helping to improve a wide array of online services. We have been clear about this for a couple of years (see Directions on Microsoft report, June 15, 2009).

Google engaged in a “honeypot” attack to trick Bing. In simple terms, Google’s “experiment” was rigged to manipulate Bing search results through a type of attack also known as “click fraud.” That’s right, the same type of attack employed by spammers on the web to trick consumers and produce bogus search results. What does all this cloak and dagger click fraud prove? Nothing anyone in the industry doesn’t already know. As we have said before and again in this post, we use click stream optionally provided by consumers in an anonymous fashion as one of 1,000 signals to try and determine whether a site might make sense to be in our index.

Now let’s move the conversation to what might really be going on behind the scenes.

Bing was launched nearly two years ago to break new ground and help move the search industry in new directions. We have brought a number of things to market that we are very proud of -- our daily home page photos, infinite scroll in image search, great travel and shopping experiences, a new and more useful visual approach to search, and partnerships with key leaders like Facebook and Twitter. If you are keeping tabs, you will notice Google has “copied” a few of these. Whether they have done it well we leave to customers. But more importantly, we take no issue and are glad we could help move the industry to adopt some good ideas.

At the same time, we have been making steady, quiet progress on core search relevance. In October 2010 we released a series of big, noticeable improvements to Bing’s relevance. So big and noticeable that we are told Google took notice and began to worry. Then a short time later, here come the honeypot attacks. Is the timing purely coincidence? Are industry discussions about search quality to be ignored? Is this simply a response to the fact that some people in the industry are beginning to ask whether Bing is as good or in some cases better than Google on core web relevance?

Clearly that’s a question that will continue in heated debate as long as there is a search industry. Here at Bing we will continue to focus on our customers, and try to provide some great innovation for consumers and the industry.

Yusuf Mehdi, Senior Vice President, Online Services Division

posted by Artw at 3:51 PM on February 3, 2011


That response is a very long and complicated way of saying, "Well, yes, exactly what they said, but it's not really copying in a bad way."

Which, you know, is a valid position to argue, but they're not arguing it. They're asserting it and then huffing and puffing about how insulted they are. Which just kind of reinforces the whole narrative up to this point.
posted by verb at 3:55 PM on February 3, 2011 [1 favorite]


Matt Cutts (a Google search engineer) has some more thoughts. He sheds more light on Google's position and does a nice job avoiding too much heat.
posted by Nelson at 4:50 PM on February 3, 2011


That response is a very long and complicated way of saying, "Well, yes, exactly what they said, but it's not really copying in a bad way."

Well, no. Google finding a way of injecting uncommon search results into Bing by enabled tracking in the Bing Toolbar is pretty damn different from Microsoft "copying" Google.
posted by Artw at 5:02 PM on February 3, 2011


It was interesting to watch the level of protest and feigned outrage from Google. One wonders what brought them to a place where they would level these kinds of accusations.

Before we explore that, let me clear up a few things once and for all.


That's his opening volley? What a little bitch.
posted by uncanny hengeman at 5:04 PM on February 3, 2011


Artw: "Google finding a way of injecting uncommon search results into Bing by enabled tracking in the Bing Toolbar is pretty damn different from Microsoft "copying" Google."

I'd describe it as "Bing finding a way to track user interaction with Google in order to acquire spelling corrections and top results for difficult queries without having to do the hard work of parsing the query themselves." Since, you know, that's what happened. And it happened on a wide scale before the "sting" ever happened, so it's not like it's an artificial thing that only happened in the context of the honeypots.

As for the response, I find Microsoft's tactic of attacking Google for "click fraud" pretty odd. It's like they think if they accuse them of something that sounds really nasty, it will distract from their own bad behavior, or at least make them seem equal.
posted by Rhaomi at 5:24 PM on February 3, 2011


"Bing finding a way to track user interaction with Google in order to acquire spelling corrections and top results for difficult queries without having to do the hard work of parsing the query themselves.

Well, sort of. Except really all Google have proved is that it tracks user interaction, including interaction with Google, and that affects page search results... which is exactly what it says it does if you sign up for it. If Google wanted to get upset about that they could have done so when they clicked the checkbox without bothering with any honeypots.

Also there's no indication of how much it affects search results, no indication that this is exclusive to Google... What would be really damning is if they could show a difference in result according to Google page rank or something else not in the clickstream or url - but no, there is no evidence of that whatsoever. They don;t seem to have bothered checking these things.

Now, if they did and found it was only Google or there was actual scraping pages then they might have a case, but in the absense of that I'd be tempted to say the only thing that's proved here is you can cause internet grar amongst the credulous by using the word "Microsoft".
posted by Artw at 5:38 PM on February 3, 2011


Actually the thing I'd be most surprised by is if enough people click that checkbox to cause the convergence in search results Google is claiming - because who the hell does that?
posted by Artw at 5:41 PM on February 3, 2011


Artw: "Well, sort of. Except really all Google have proved is that it tracks user interaction, including interaction with Google, and that affects page search results... which is exactly what it says it does if you sign up for it. "

The opt-in window for Suggested Sites is not clear on this point at all, actually. If the effects really were personalized, as the box suggests, then this would be a dumb thing to complain about -- you opted in to personalized results based on what you click, then clicked the Google honeypot! Of course it will show up on your version of Bing! But the mirroring of Google corrections and results was occurring for all users; when the story first broke, you could search for the nonsense terms on Bing and still get the fake pages.

"Actually the thing I'd be most surprised by is if enough people click that checkbox to cause the convergence in search results Google is claiming - because who the hell does that?"

Have you seen the average IE user in action? In my experience they've rarely met an "Accept/OK/Whatever" button they didn't like. Those "toolbar hell" jokes don't come out of nowhere.
posted by Rhaomi at 6:02 PM on February 3, 2011


Artw: I haven't checked, but I'm guessing Suggested Sites is defaulted to opt-in? So a user would have to actively click a checkbox to turn it off. And as with all things like this, I'm guessing 95%+ never change the default setting.

(I'm reminded of the original Google Toolbar, which had a similar feature complete with no default choice and a really clear, simple explanation of the choice you were making. That's the ethical way to do things like this. I wonder if Google is still so careful to get informed consent?)
posted by Nelson at 6:21 PM on February 3, 2011


One thing this issue seems to have demonstrated is that many people don't have a good grasp on what machine learning is, how it works, or how it is used by both Google and Bing.

For example, on WillF's blog post, jordan117 says
It doesn’t look like Bing ever understood how or why Google returned the corrections it did, or developed ways to make similarly advanced logical leaps based on that data. It was more a matter of “Google returned site X for query Y, so we will too.” That’s not competition, that’s simple mimicry.
If Bing collects clickstream data that allows it to make associations between components of a URL and links clicked on the page, the point is probably not for Bing engineers to examine the data and figure out why, when the parameter q=still+safe+to+eat is present in the URL, the link http://ask.metafilter.com/54402/Still-safe-to-eat is often clicked on. The point is to collect so much data that a useful number of such associations can be made and the machine machine learning algorithm can generalize a little so that if it sees q=safe+to+eat it may still say "well, we'll count that as a partial (0.75) votes for the result http://ask.metafilter.com/54402/Still-safe-to-eat."

That's not simple mimicry, and it's not engineers looking at the data to make logical leaps. It's one component of a learning system figuring out a positive correlation between some pieces of a URL and the next link clicked by a user. It's unlikely that anyone ever even looks at the actual associations.

Google and Bing both feed lots of different kinds of data, calling each one a "signal", into similar learning mechanisms in order to rank possible results to your queries.

jordan117 also says
If Google had set up these honeypots out of sheer paranoia, then it would be natural for Bing to pick up on some of them since there were literally no other signals to use.
and
Bing didn’t fall back on Google’s results for the honeypots alone as a last resort when all other search signals failed. Bing borrowed from them on a wide scale, even when they were flawed, to the point that Google noticed and decided to test for it.
In my example above, the 0.75 votes for a particular URL gets mixed in with "1000" other votes for other URLs provided by other parts of the system. WillF didn't say that the clickstream data is only used when there is no other information--we can assume it is used for every query. But it's probably true that for lots of queries there are other types of information being taken into account (e.g. "how many times does the term appear on this page?") that are given greater weight, so that the clickstream data has little effect--But not no effect at all.

The reason Google did the kind of experiment they did, where they tried to create an association inside Bing's brain between a nonsense word and a page in which the word never appears, was to decrease the possibility of any other signals affecting the outcome.
posted by jjwiseman at 7:00 PM on February 3, 2011 [3 favorites]


BTW, I am a Googler who is not speaking for Google.
posted by jjwiseman at 7:01 PM on February 3, 2011


Once you understand what Bing is doing (and this whole topic seems to be a subtle and difficult one for people to grasp, and that's understandable, though I'm surprised that even on Hacker News so many people are making fundamental errors in analyzing it), the question is whether or not it is wrong. Unfortunately I haven't seen a lot of informed argument over this, mostly just assertions one way or another.

Is it wrong for Microsoft to use the data provided by users who have opted-in to the terms of the toolbar? Is it wrong for Microsoft to use clickstream data to create associations between form inputs and the next link a user clicks?

Is it wrong for Microsoft to create a general clickstream learning system, and add what is probably a relatively tiny amount of code specifically for parsing another search engine's form inputs and links? If so, why? Is it illegal or is it unethical? Does it hurt their brand? Does it hurt users?

That's what I would like to see specific discussion of.
posted by jjwiseman at 7:15 PM on February 3, 2011 [2 favorites]


(I'd also add that if your machine learning technique really applied to sites other than Google, it'd be pretty trivial to game (i.e. secretly influence) Bing's search results in exactly the same way that Google did to generate hits on fake pages.)#

I've seen other people state this, and I don't think it's true.

For one thing, the clickstream data is one signal among many. I think it's unlikely that it is given much weight compared to other signals like whatever Bing's version of PageRank is, or term frequency. Google showed that you can get Bing to link a nonsense word that appears nowhere on the internet to a page that also doesn't contain that word, not that you can get Bing to return fake results for words that appear on other pages.

For another thing, it would be very naive for any search engine that used user-supplied data without considering spam, and at least attempting to filter it out. This doesn't mean that it wouldn't be possible to intentionally influence search results (even Google was susceptible to Google bombing), but I would be surprised if it were trivial.

(Again, I'm not speaking for Google.)
posted by jjwiseman at 7:24 PM on February 3, 2011


I haven't checked, but I'm guessing Suggested Sites is defaulted to opt-in?

Not on any install I've seen. You have to go to tools and select it from a menu, and then you get the pop up that Rhaomi linked, which states "suggested Sites is an online service that uses your browsing history to make personalized website suggestions", and then if you click through on the privacy statement (ha! No, lets pretend that people do that for a second) it states "The information we collect from you will be used by Microsoft and its controlled subsidiaries and affiliates to enable the features you are using and provide the service(s) or carry out the transaction(s) you have requested or authorized. It may also be used to analyze and improve Microsoft products and services." - which, well, it does.

So, anyway, if you still click "Yes" after that it turns on. And then you can go back to Tools and turn it off again with a single click.

I like to think anyone with half a braincell probably concluded that Microsoft was going to use your browsing history to conduct satanic rituals when confronted with the first pop up, noit because the are Microsoft but because they are a corporation and all corporations are just like that, but sadly you are probably right.

(Man, you should never actually read these Privacy Statement things... here's a bit for if you really want to get outraged: "Microsoft may access or disclose information about you, including the content of your communications, in order to: (a) comply with the law or respond to lawful requests or legal process; (b) protect the rights or property of Microsoft or our customers, including the enforcement of our agreements or policies governing your use of the services; or (c) act on a good faith belief that such access or disclosure is necessary to protect the personal safety of Microsoft employees, customers, or the public.")

Is it wrong for Microsoft to use the data provided by users who have opted-in to the terms of the toolbar? Is it wrong for Microsoft to use clickstream data to create associations between form inputs and the next link a user clicks?

Have you seen the average IE user in action? In my experience they've rarely met an "Accept/OK/Whatever" button they didn't like. Those "toolbar hell" jokes don't come out of nowhere.

More likely the URL than the form field, but I'd say arguing back and forth over whether MS should be capturing this user data is totally legit, it's just a different conversation entirely from the "M$ is stealing the Googlez!" one I've seen here and elsewhere.
posted by Artw at 7:48 PM on February 3, 2011


MS has every right to be capturing user data (assuming they opt-in etc) and to be using it in any way they please.

Even if MS is doing exactly what the detractors are claiming they're doing, they aren't actually doing anything wrong legally - they're simply being losers, but that's not illegal.

MS's response is particularly embarrassing... It was interesting to watch the level of protest and feigned outrage from Google. One wonders what brought them to a place where they would level these kinds of accusations.

Perhaps this spokesman is unaware of Google's continuing stream of record earnings? Or the fact that despite MS's much greater size and many years of attempting to compete, they've been unable to create a competitive offering in this areas? Have they not noticed that people use Google as a synonym for search?

To call it "feigned" is particularly offensive - it seems to me that Google has pretty accurately proved its case and if so, has every right to be outraged.

(Tiny aside - I remember when I was there one engineer pointing out that "Don't be evil" is less strong than "Don't do evil things" because you could justifiably do a few evil things and not actually be evil.... heheheheh)
posted by lupus_yonderboy at 8:16 PM on February 3, 2011


Is it wrong for Microsoft to use the data provided by users who have opted-in to the terms of the toolbar? Is it wrong for Microsoft to use clickstream data to create associations between form inputs and the next link a user clicks?
As lupus_yonderboy asserts, I don't think that there's anything fundamentally wrong about what they're doing. Frankly, I think it's a great idea and the concept of a 'meta-search-engine' that learns from other search engines is one that shows promise, at least IMO.

What's really distasteful is the squirmy dodgy responses that Microsoft is offering. Accusing Google of clickfraud is the sort of self-immolating response that damages a company's credibility. Far better to say that, yes, of course we are for the following reasons, and all of the users sending that data to us are doing so voluntarily. Point out that Google was once accused of piggybacking on the work of people who built manual lists of useful links -- using other peoples' manual curation work to improve their own proprietary PageRank algorithm. Ask why Google is complaining that other people have figured out how to leverage crowd knowledge in new ways.

But no. That's not what they said. What they said was, "We're not copying! Google is committing clickfraud! They're paranoid!" while admitting, in the fine print, that they are in fact doing just what Google says they're doing.
posted by verb at 9:16 PM on February 3, 2011 [2 favorites]


Also, on an unrelated sidenote, I curiously followed lupus_yonderboy's profile links and now I desperately want him to optimize all of my code.
posted by verb at 9:36 PM on February 3, 2011


I am flattered! Send it to me!
posted by lupus_yonderboy at 11:06 PM on February 3, 2011


And let me add one more thing - despite my grumping above, Google was an amazing place to work, a place where you can bring up ethical issues in a meeting and have those trump anything else; and I learned about as much about programming computers in that five years at Google than I had in the previous 25 years of programming.

I feel that corporatism is destroying the world - and Google, for all its virtues, is a large corporation and has many of those problems - but I still have very fond feelings for them.
posted by lupus_yonderboy at 11:10 PM on February 3, 2011


OK, one more thing.... people slag MS overall, and I have some personal bitches with them from projects in the past, but they have still achieved great things.

I don't really think that MS has been that good for many of the things I hold dear, like open source software, but I personally feel that Mr. Gates' philanthropic activities have had a far greater positive effect than the negative effects of their somewhat shoddy operating system. Oops, I'm doing it too. :-D

As an aside, I have often wondered what happened to Mr. Gates... when he first started with his charitable activities, they seemed ludicrous, like bringing computer literacy to the third world(*) - but then he went to Africa and then came back and had a clear and very aggressive plan to do really useful, effective things like eradicate specific, targetable diseases. As far as I know, he hasn't talked about it, but in my heart I feel he met one specific person who clued him in to the right ideas and had an epiphany... if I ever met him, that'd be the first question I asked him, and if anyone knows about this, please let me know.

(* - this phrase is a running joke with me - when I was working for the notorious Drexel Burnham Lambert in 87 as a young person, before the ethical issues quickly drove me away, I met two con artists in a bar who first tried to pick up two young girls with stories of their sexual conquests and then tried to work on me by buying me drinks and getting me high on their excellent hash to get my money - money I didn't have, Wall St associates didn't get paid that much then, but more, I didn't really care about making money that much - astonishingly funny schemes except they were quite serious, one was using the base of the Brooklyn Bridge to store wine, yes, they were trying to sell me the Brooklyn Bridge! and when they realized I didn't really care about money, one of them said to me, "You know, I'm on the board of directors of a company bringing computer literacy to the third world." But Gates did, in fact, talk about that initially - before his conversion experience...)
posted by lupus_yonderboy at 11:24 PM on February 3, 2011


I've written a little more about this at http://t.co/RsNA8xE, especially trying to address some of the things that Matt Cutts discusses.

Regarding lupus_wonderboy's comment about being "trivial to game:" I think this is what Harry Shum (head of Bing) meant when he talked about click spam. I even read a tweet tonight suggesting this could be the basis for a startup company.
posted by willF at 11:33 PM on February 3, 2011 [1 favorite]


Having seen the "opt-in" form as displayed in the Matt Cutts response above, it is highly deceptive. Less "opt-in" than "here's a vaguely-named feature you probably want, handily ticked for you". PS (small print) it "uses your browser's history". No mention of sending data to MS.

I haven't used the Google toolbar for a while, but last time I did it was very clear about what features you'd gain, and what your data would be used for if you actually opted in (by checking, not unchecking, a box)

Likewise when you install Windows 7 / IE, the choice of search providers mentions Google a ways down the list, called something unusual ("Google Personal Search" or something equally confusing). Bing is top of the list, if you even make it as far as the list, and is installed by default.

Again, typical MS behaviour. Aggressive leveraging and treating their customers like fools.


This is not to downplay the expertise and passion of Bing engineers, who I'm sure are great, but the business practices of a company who have been doing stuff like this for decades, to the detriment of consumers.
posted by iotic at 12:07 AM on February 4, 2011


willF: thanks for the writeup, but I still don't buy the explanation at all - and I think you have a completely wrong definition of clickspam.

Let's start with the second part - clickspam comes in two varieties.

The first is where you get people with no intention of buying to click on ads that appear on your site. The second is where you click on your competitor's ads with no intention of buying.

To the best of my knowledge, no one has ever tried to clickspam to fool the quality enhancement programs of search engines to boost their sites, though, hmm, it wouldn't be outrageously hard.

Now, let me explain how clicktracking works to improve your search engine. The basic idea is simple - the engine displays a results page and then watches what you do. If you click on a result and never come back, this is the best outcome. If you click on a result, and then come back and click on another, clearly the first result was not so good.

This works fairly well because there's a very clear connection between your actions and your satisfaction with the results - but even with that, there's an amazing amount of noise that needs filtering out.

Now, let's suppose you want to use other navigation, navigation NOT from your search page, to improve your results. How, exactly, would you do that?

Let's take the easiest case - where you are on a page that you got to right from either GOOG or Bing. For example, suppose I search for "wombat" and I click to a page on wombats and never come back.

Clearly that page is a good result for wombats. But what does it mean if I click on some link on that page? Is that link likely to be a better result for my original search query? Or a worse response?

My claim is that the best you can say is that that other link is probably also about wombats, but that even that is pretty dodgy.

Machine learning isn't some magic bullet - you need to have some function that you're actually trying to optimize. For a search results page you're presenting to the user, you can easily think of various functions - for example, you might be trying to increase the number of times the user clicks on the very first result and never comes back.

But if you're starting on some page that isn't on your site, and looking at other clicks that also don't go to your site, than what, exactly, are you trying to optimize?

There is, however, one possible case you can use, and that's when the first page is someone else's search engine. You can watch people use Google, see how often they click on the first result and never come back (in the real world, the metrics are far more complex but much like that in their roots), and then decide which Google results are good - in exactly the same way you'd do for your own search engine.

This appears to be exactly what Bing is doing - but this means that they're writing a machine learning program to discover what the best Google search results are and to add those results to their own search engine. Which is exactly what people aren't happy about,

I'm very open to any other explanation that makes sense - for example, a good explanation of how you can use clicks on non-search engine pages to improve search results.
posted by lupus_yonderboy at 12:42 AM on February 4, 2011


lupus_yonderboy, I've explained in general terms how the data in the clickstream could be used to build machine learning models; in particular how what is clicked on (or not) could be used as signal. You're right, I didn't talk about what the model is trying to learn (what error it is trying to minimize), but that wasn't important in what I was trying to say -- I was just trying to show where the signal was. No one has denied that Bing is using clickstream data--i.e. which links on the SERP (the '10 blue links', or search engine results page for the non-cognoscenti) are being clicked, and which are not. I said as much in my article.
posted by willF at 5:45 AM on February 4, 2011


A final comment from me, I think: Danny Sullivan weighs in after several days: Why Google's wrong in its accusations. It contains more details from Bing, many of which accord with what I wrote previously.
posted by willF at 11:04 AM on February 4, 2011 [2 favorites]


So wait.. I'm confused. Is Google evil or not? Or is it M$ that's now evil...but they were always evil. "Evilornot.tumblr.com"
posted by infini at 11:22 AM on February 4, 2011


A final comment from me, I think: Danny Sullivan weighs in after several days: Why Google's wrong in its accusations. It contains more details from Bing, many of which accord with what I wrote previously.

Really good article there.
posted by Artw at 11:25 AM on February 4, 2011


> lupus_yonderboy, I've explained in general terms how the data in the clickstream could be used to build machine learning models; in particular how what is clicked on (or not) could be used as signal.

Perhaps I'm missing the exact one, but I felt my rebuttal above did take those previous writings into account - and is not refuted by the Danny Sullivan article.

We all agree that tracking clicks from a search page can be used to improve the results of a search engine. No problems there!

My main issue is that tracking clicks from a search page is completely different from tracking clicks from a non-search page - your interaction with search pages is completely different than your interaction with non-search pages. As I mentioned above, machine learning isn't a panacea, you need to be using data that's at least somewhat comparable to itself. You need some function you're trying to optimize, it's fairly clear what that would be for a search page, but not at all clear what that would be for a non-search-page, and certainly very different.

I also claimed that clicks from non-search pages are less useful in determining relevance and correctness of a page to a query.

So it seems to me that the only way around it is to have to a specific learning model that targets search result pages only. Which means building a machine learning model specifically to use the results of other search engines to improve yours.

Now, perhaps there's some way out of this apples-and-oranges issue that I'm not smart enough to see, but I don't think it's been suggested above at least.
posted by lupus_yonderboy at 3:01 PM on February 4, 2011 [2 favorites]


The Dirty Little Secrets of Search
posted by Artw at 9:38 AM on February 13, 2011


« Older "An Act To Declare That Reality Is Now The Plot Of...   |   Earthworming back into our hearts... Newer »


This thread has been archived and is closed to new comments