The Next Big Breakout
December 14, 2009 6:16 AM   Subscribe

An Omnivorous Google Is Coming. "Imagine what it would be like if there was a tool built into the search engine which translated my search query into every language and then searched the entire world’s websites," she says. "And then invoked the translation software a second and third time – to not only then present the results in your native language, but then translated those sites in full when you clicked through.” Marissa Mayer, Google's vice president for search products and user experience, shares her unparalleled insights into the future of internet search engines.

"She divides the focus areas for Google into three parts: modes, media and personalisation. Modes refers to the ways we can access search – the latest addition to which has been Google Goggles – an Android mobile tool which enables people to search using pictures instead of words. Users focus their phone's camera on an object, and Google compares elements of that picture against its database of images. When it finds a match, Google will tell you the name of what you're looking at, and provide a list of results linking through to the relevant web pages and news stories."
posted by netbros (65 comments total) 7 users marked this as a favorite
 
**Nothing which is not eats Google to approach. " Imagines anything is the elephant, if has the tool to hit the system to enter translates my search to inquire that Cheng Mei plants the language then search world the website search engine, " She said. " On the other hand not only then cites the translation software second and third regular then proposes in yours mother tongue's result, but translator full these stands, when you through have clicked”. Marissa Mayer, Google' s the search product and the user experience vice-President, will share her unprecedentedly to see clearly into the future Internet search engine.**

English to Chinese to English.
posted by billysumday at 6:19 AM on December 14, 2009 [22 favorites]


"Eat not the offerings from the Non-Google, for they are an abomination unto the Google."

-The Orange Bible (Revised)
posted by The Whelk at 6:22 AM on December 14, 2009 [2 favorites]


Classic "and then some magic occurs" technology plan.

If Google could provide machine translation that worked well enough for this search thing to be useful, search would be the tiniest fraction of their income.
posted by DU at 6:26 AM on December 14, 2009 [9 favorites]


Imagine what it would be like if there was a tool built into the search engine which translated my search query into every language and then searched the entire world’s websites," she says. "And then invoked the translation software a second and third time – to not only then present the results in your native language, but then translated those sites in full when you clicked through.

Translate the inquiry of my investigation in all languages, when “her you say there is equipment which was made in the search engine which searches the web sight of the entire world, like those where is that imagine. “And then the translation software when the 2nd and 3rd time which then does not show the result of your home country language - the dust sounding only it is executed, but with one side the sufficiently those place it is translated
posted by StickyCarpet at 6:32 AM on December 14, 2009


(eng-jap-eng)
posted by StickyCarpet at 6:33 AM on December 14, 2009 [1 favorite]


Also, this is Pepsi Goo.
posted by DU at 6:34 AM on December 14, 2009


Some "easy" languages are improving. Google translations into Spanish have improved considerably over the last 12 months.
posted by Holly at 6:37 AM on December 14, 2009


Awesome, now all my searching can be done in Klingon* and I never have to use english again.

*or sweedish chef - I don't mind that either.
posted by Nanukthedog at 6:38 AM on December 14, 2009


Clearly full machine translation software is only five years away.
posted by demiurge at 6:39 AM on December 14, 2009 [5 favorites]


Similarly, if you could make computer vision that distinguished an elbow from a nipple you would instantly have an enormous content filtering enterprise.
posted by a robot made out of meat at 6:49 AM on December 14, 2009


" unparalleled insights into the future of internet search engines"

Some members of my family have been working on machine translation -- with occasional forays into OCR -- since the 1970s. I try to keep an eye on it as I grew up around practical machine translation work. The last 5 years have seen some pretty good incremental improvements being brought to market, but... this lady thinks way, way too highly of her translation products or the state of the technology in general.

Universal translation would be great, but we're at least a decade and more likely two or more decades away from anything even loosely fitting the description. At the moment, MT doesn't produce anything I'd generally want to try reading unless I were truly desperate for some nugget of information. It most certainly doesn't produce results that are comprehensible, pleasant and enjoyable to read in the manner that a human translator does.
posted by majick at 6:51 AM on December 14, 2009 [2 favorites]


Marissa Mayer has an MS in Computer Science, with some kind of specialization in AI. So for her to say this implies one of the following:

1) She's forgotten everything she ever learned
2) This is marketing blather (either self-directed or from above)
3) Google is a lot more advanced than we previously thought

Whichever is true, Google is more evil by the day.
posted by DU at 6:54 AM on December 14, 2009 [2 favorites]


MT doesn't produce anything I'd generally want to try reading unless I were truly desperate for some nugget of information. It most certainly doesn't produce results that are comprehensible, pleasant and enjoyable to read in the manner that a human translator does.

Friend of mine did his Ph.D on MT and AI. He had some kind of program that would translate anything into Esperanto and then into the requested language. If you want to hear cursing in Esperanto*, ask him how it went.

* or Russian
posted by The Whelk at 6:57 AM on December 14, 2009 [1 favorite]


It most certainly doesn't produce results that are comprehensible, pleasant and enjoyable to read in the manner that a human translator does.
posted by majick at 9:51 AM on December 14


I think you're failing to see the value of machine translation, which is this: it takes the reader from zero understanding of the source material to a non-trivial non-zero level of understanding instantly and for free.
posted by Pastabagel at 6:57 AM on December 14, 2009 [9 favorites]


Nice, and optimistic, considering how far translation algorithms still have to go.
posted by tybeet at 6:58 AM on December 14, 2009


" a non-trivial non-zero level of understanding instantly and for free."

That's cool, and believe you me, I spent many many many years of my childhood with non-trivial non-zero understanding of source texts while helping to debug chunks of my family's transliteration-to-translation process. But that's not what the Marketing Lady up there says she's selling, and I'm calling her on it.
posted by majick at 7:06 AM on December 14, 2009 [2 favorites]


I'm on my phone, so I can't link to it now—OK, I can, but I'm too lazy—but Google Blogoscoped once had a post about an interesting Google Books related project. Apparently, they're feeding their computers dense book after dense book in every available translation in an attempt to "teach" them to read and translate accordingly. Perhaps they've had a breakthrough and haven't unveiled it yet?
posted by defenestration at 7:11 AM on December 14, 2009


Marissa Mayer, the gorgeously geeky Googler

Based on previous Mayer interviews and profiles I've read, it is so hard to take anything she says seriously. The one where good students are good at all things because someone probably decided that getting an A in art history wasn't worth his time? Whatever.
posted by anniecat at 7:17 AM on December 14, 2009 [1 favorite]


The value of machine translation is vastly over-estimated, in particular by people who only speak one language.

The few times someone has posted a link here on Metafilter, however valuable the content, consisting of a machine translated page, the discussion is all jokes at the expense of the incomprehensible non-English.

Then the "entire world" I am sure means the top 20 or so languages without regionalisms, dialects etc. I mean this makes no sense.
posted by vacapinta at 7:33 AM on December 14, 2009


Don't forget that Google just ingested the world's books. That means they have very high quality translations of many works across many languages to train their engines on. Contemplate for a minute, the raw power of having five to ten copies of each book in a different language-- a modern Rosetta stone.

Recently, I've noticed many comments on the internet saying that Google Translated results are often not immediately apparent -- some Swedish and Finnish articles come to mind.

Don't doubt the power of machine learning with access to ever word ever written. Given that kind of corpus, and enough computing acreage, all kinds of crazy things can happen.
posted by fake at 7:36 AM on December 14, 2009 [6 favorites]


Machine translators should start giving us multiple options and asking "does this make sense?" so it'll slowly learn from human feedback. That'll take care of some of the random noise and interjections of the word "elephant," especially in commonly translated phrases.
posted by haveanicesummer at 7:40 AM on December 14, 2009 [2 favorites]


1. ????
2. ????
3. PROFIT!
posted by blue_beetle at 7:41 AM on December 14, 2009


    they're feeding their computers dense book after dense book in every available translation in an attempt to "teach" them to read and translate accordingly
Met people who are doing this as well and at least some people seem pretty optimistic about this approach. Problem is you build this huge, interlinked decision tree that can translate texts pretty well and then you run out of the 250 gigabytes of memory you have. The computer having terabytes of memory is too expensive and you're left looking a bit sad faced, waiting for prices to go down.

I thought the approach of using intermediary meta-languages died out years ago when the computer scientists discovered that linguistics and grammar was complex.
posted by uandt at 7:43 AM on December 14, 2009


I completely understand the quibbles--or serious problems depending on the depth of involvement or need for this utility; I just can't help but be thrilled every time I read of advancements in MT.

When I was young and had dropped out of college, I went back & took an evening course in engineering & technology (circa 1973). The assignment from the prof was to suss out an unmet need and project a solution based on as-yet undeveloped technology. I proposed real-time translation of spoken language into the receiver's language of choice. The prof forced me to come up with another project, b/c--in his not so humble opinion--such projects would never be realized. If the difficulty of the multiple simultaneous translations weren't enough of a hurdle, I was proposing a reverse-engineering of the Tower of Babel, and no responsible technology firm would invite such a disaster.

I sometimes wonder where he is buried so I could water his grave.
posted by beelzbubba at 7:43 AM on December 14, 2009 [1 favorite]


While all the pissing on AI/machine translation is well deserved (the computer translation joke about the vodka being ready but the meat being rotten dates from my youth and I'm over 50).

However, it is also true that the state of the art has advanced to the point where it is actually useful. In the past, games like billysunday plays above were about the only use for these tools. But I've found modern translators increasingly helpful when planning for trips to Europe. Yes, they don't get everything right, but they do well enough to convey the majority of the information.

Also, the technique that Google (and others) are using is different than what was being done in the "AI is just around the corner" years. Statistical translation is better, but "dumber" than the old AI "the machine actually understands the language" approach.

As a counterpoint, here's the translation from English to French and back:

An omnivorous Google Is Coming. "Imagine what it would be like if there was an integrated tool in the search engine that reflects my search query in any language and then searched the websites of the world," she said. "And then invoked the translation software for both the second and the third - not only then present the results in your mother tongue, but then translates these sites in their entirety when you click through." Marissa Mayer, vice president of Google's search products and user experience shares his unparalleled insights into the future of Internet search engines.
posted by CheeseDigestsAll at 7:45 AM on December 14, 2009 [4 favorites]


uandt: "Met people who are doing this as well and at least some people seem pretty optimistic about this approach. Problem is you build this huge, interlinked decision tree that can translate texts pretty well and then you run out of the 250 gigabytes of memory you have. The computer having terabytes of memory is too expensive and you're left looking a bit sad faced, waiting for prices to go down."

Thankfully, compared to the actual translation, that problem is easy. Just build bigger computers. There are at least three companies (Amazon, Google, Microsoft) who already measure computational power by the hectare, and by the megawatt. Rounding to the nearest few TB of RAM is an everyday thing. In that world, it's true your phone or netbook is unlikely to have it's own copy of the world's Rosetta Stone... but why would you bother? Just ask the cloud.

The best part is, statistical translation at that scale can be tuned, actively, by user feedback. The engine has enough samples of folks saying "correct" or "that sucks" to pick up on dialects, colloquial slang, etc. After all, it's only a few more terabytes.
posted by zeypher at 8:04 AM on December 14, 2009 [2 favorites]


THE GOGGLES DO NOTHING!
posted by The Whelk at 8:09 AM on December 14, 2009


I feel pedantic but I have to point out that translating from A to B and then back to A is a way of testing that the translation is deterministic and how uniform the mapping is. It doesn't say anything about how comprehensible the text is in language B.

In other words, a "translator" which just translates abcdef into bcdefg and then does the reverse would perform perfectly on the there-and-back-again test without actually doing anything worthwhile.
posted by vacapinta at 8:09 AM on December 14, 2009 [6 favorites]


I came to this thread to post a hardy-har-har English to Chinese to English translation of the front page paragraph, but since I was beaten to the cynical punch, I will instead be anti-cynical:
An omnivorous Google Is Coming. "Imagine what it would be like if there was an integrated tool in the search engine that reflects my search query in any language and then searched the websites of the world," she said. "And then invoked translation software for both the second and the third - not only then present the results in your mother tongue, but then translates these sites in their entirety when you click through. "Marissa Mayer, vice president of Google products search and user experience shares his unparalleled insights into the future of Internet search engines.
That's English to French to English. Things are getting much better in this area than they used to be.
posted by Flunkie at 8:12 AM on December 14, 2009 [1 favorite]


For example, lets take Flunkie's first sentence above and translate it to French and back:

"I came to this thread to post a hardy-har-har English to Chinese translation into English of the clause to cover page"

Pretty good, you say, it translated "hardy-har-har" to French and back!
No, it didn't. It didn't know what to do with it and just kept it as an undefined token. So, the translation back seems much better than it actually was.
posted by vacapinta at 8:18 AM on December 14, 2009 [2 favorites]


An Omnivorous Google Is Coming.

Can't sleep. The Googles will eat me.
posted by empath at 8:29 AM on December 14, 2009 [1 favorite]


To be fair, the French translation of the top FPP paragraph is passable. It wedges inexplicably on "Google is Coming" but apart from that, it provides a somewhat complicated but definitely comprehensible output. "Moteurs" isn't the best contextual translation for 'engine', but that's the kind of bug you want at this point, it shows just how far down the road they've managed to drive the thing now.
posted by kowalski at 8:30 AM on December 14, 2009


"Moteurs" isn't the best contextual translation for 'engine', but that's the kind of bug you want at this point, it shows just how far down the road they've managed to drive the thing now.

I agre with kowalski that this is a signpost, not a "Welcome to PerfectTranslationville" sign. Critics of the imperfect rendering are familiar with neither the two languages in question nor the progress to date of translation algorithms.

To take a single example, one usage in English of "still" (in the sense of "up to this time") is mirrored in French by two different words, encore and toujours. The tricky bit is that encore more commonly means "again" and toujours typically means "always". A few years ago I was writing something in French and wasn't sure which of the two words would suit the sentence better -- the fine points sometimes evading me -- so I fed the English version of the sentence into Google Translate, as well as Babelfish and others. Every translator gave me back a French version which rendered "still" as distillateur (i.e. the thing you use to make moonshine). Helpful, that.
posted by ricochet biscuit at 8:51 AM on December 14, 2009


THE GOGGLES DO NOTHING!

Not quite nothing. I've tried the Google Goggles Android app on about 20 things on my G1, and it did recognize one: the can of coke.

It told me it was a can of coke.
posted by TheophileEscargot at 9:10 AM on December 14, 2009


Critics of the imperfect rendering are familiar with neither the two languages in question nor the progress to date of translation algorithms.

I am familiar with the former, thanks. I speak French. I admit to being unfamiiar with the latter. Yes, the translation is more than passable for this bit of text.

My larger point above is that you test a translation engine by translating say English to French and then presenting the results to a French speaker. Not by machine-translating it back to English as some people in this thread have proposed.

There's yet another problem here. The discussion seems to be how to translate N languages to English. Not how to translate X language to Y language.

Google translates "Il est toujours gros." into "Siempre es bueno"

Whats going on here?
Sounds to me like Google is doing the following: French -> English -> Spanish. That is, gros-> great -> bueno.
Which makes absolute no sense for two otherwise closely-related languages.
posted by vacapinta at 9:18 AM on December 14, 2009


Marissa Mayer is the Vice President of Search Product and User Experience, and I imagine a good part of her job is hyping the Next Big Thing with Google. How is it evil for a search company to attempt to broaden their search results, even when the results aren't 100% spot on? Are they perfect as it stands? The Google translations today are generally good enough to get the idea across, and that's usually enough to make the search result worthwhile.

As for the Language A to B back to A translations: why not go all the way and use translation party as the example for failures in translation? If the translations changed nothing, people all over the world would be out of translating jobs. Feeding the already broken language back through is only amplifying the errors.

The trick will be to translate phrases, not just words. I look forward to the day that "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo" is translated well by machine.
posted by filthy light thief at 9:26 AM on December 14, 2009


THE GOGGLES DO NOTHING!

Not quite nothing. I've tried the Google Goggles Android app on about 20 things on my G1, and it did recognize one: the can of coke.

It told me it was a can of coke.


Think about the utility this presents for the blind.
posted by fake at 9:27 AM on December 14, 2009 [2 favorites]


Google translates "Il est toujours gros." into "Siempre es bueno"

For English speakers: Google translated the French sentence "He is still fat." to the Spanish sentence "It is always good."
posted by vacapinta at 9:31 AM on December 14, 2009


fake: "THE GOGGLES DO NOTHING!

Not quite nothing. I've tried the Google Goggles Android app on about 20 things on my G1, and it did recognize one: the can of coke.

It told me it was a can of coke.


Think about the utility this presents for the blind.
"

It's a nice thought, but unless you've made the effort to isolate the subject from its background a bit, Goggles will throw up a "dunno, lol" sort of error where it tells you it's good at scanning labels and barcodes, not clothing and hamsters.
posted by boo_radley at 9:36 AM on December 14, 2009


Aren't all the important websites in English anyway?
posted by horsemuth at 9:36 AM on December 14, 2009


I kid, I kid....
posted by horsemuth at 9:37 AM on December 14, 2009


I've tried the Google Goggles Android app on about 20 things on my G1, and it did recognize one: the can of coke.

It figured out beer logos on coasters, despite mediocre lighting, glare, and cockeyed angles. Impressive in some aspects, not so much in others. To me it's more about what the future holds for robotics and real-world visual search than a terribly useful app in its present form.

(Then again... say it was a bottle of something tasty labelled in Korean, which I don't read... if Goggles found it in a search, that would actually be helpful.)
posted by Foosnark at 9:45 AM on December 14, 2009


Statistical translation does better than naïve lex and yacc-based list grammars, but lexical ambiguity (e.g., whether to translate "bitch" as "pute" or "chienne") will never be solved without a machine 'understanding' what is being said -- some form of semantic analysis is imperative for context.
posted by jock@law at 9:49 AM on December 14, 2009


I don't know, man. She's announced this huge idea as a kind of "wouldn't it be cool if we did this? well we're totally doing it" kind of thing. except they can't do the first part of it yet, much less the following 3. she might as well have announced "hey, imagine a car that knows when you want to drive it, pulls out of its parking spot, drives to the front door of your building so you can get in without walking to where you parked it."

which would make me think "you figure out how to let my car know when I want it, first, ok? much smaller scale, and you still can't do it. then we'll deal with the rest of that nonsense and the million other problems you'd encounter for that."
posted by shmegegge at 9:59 AM on December 14, 2009


vacapinta, your point is obviously true in a theoretical sense and to some degree in a practical sense. But you're making much more of it than it warrants.

Let's take English to French (and not to English):
Un omnivore Google Is Coming. «Imaginez ce que ce serait comme s'il y avait un outil intégré dans le moteur de recherche qui traduit ma requête de recherche dans toutes les langues et ont ensuite fouillé les sites web du monde entier, dit-elle. "Et alors invoqué le logiciel de traduction pour la seconde fois et le troisième -, non seulement présentent ensuite les résultats dans votre langue maternelle, mais ensuite traduit ces sites dans leur intégralité lorsque vous avez cliqué à travers." Marissa Mayer, vice-président de Google pour les produits de recherche et de l'expérience utilisateur , partage ses perspectives inégalées dans l'avenir des moteurs de recherche Internet.
Other than things like "Google" and "Marissa Mayer", the only untranslated thing is "Google Is Coming", which is because "Is" and "Coming" are capitalized in the original.

If you instead entered "An omnivorous Google is coming", it would translate that as "Google est un omnivore qui viennent", i.e. "Google is an omnivore which comes", which is not exactly right, but it's not remotely "Nothing which is not eats Google to approach".

I am not a native French speaker, and in fact I am barely able to struggle through French, so I can't comment on how good that translation (overall) is, but I can tell you this:

I use Google Translate to help read news articles in French fairly often (whenever I decide "OK, now I'm really going to start learning French"). It generally does well. Of course there is strangeness, and there's an occasional thing that's outright bizarre, but it's actually extremely helpful, and it's nowhere near the poor level demonstrated by the English-to-Chinese-to-English "Imagines anything is the elephant" stuff.
posted by Flunkie at 10:07 AM on December 14, 2009


And it's far better than the equivalent services were a decade ago or whatever.
posted by Flunkie at 10:13 AM on December 14, 2009


> The best part is, statistical translation at that scale can be tuned, actively, by user feedback. The engine has enough samples of folks saying "correct" or "that sucks" to pick up on dialects, colloquial slang, etc. After all, it's only a few more terabytes.

Click on a chunk of text in a Google Translated document and it prompts you for a better version of the translation. On the surface this effort is of limited value -- if I knew what a non-English document said, I wouldn't have relied on Google in the first place -- but as the machine translations improve enough that the failures are mostly awkwardly handled nuance, idiom or jargon, easily comprehendible in context, I'll be able to help Google by editing the translation appropriately, and my contribution becomes another drop in the ocean of statistical data. It's similar to the Wikipedia model of gradual progress in the knowledgebase.
posted by ardgedee at 10:15 AM on December 14, 2009


I was thinking about Goggles earlier in the weekend, and at first I couldn't work out how it would be useful. Afterall, the purpose of a logo is to tell you the companies name - and text search is always going to be more useful than trying to frame a picture just-so. And, aside from that, there are properties of a searchable object that aren't visible from just its picture, and might be better found with a series of text searches.

Upon further reflection, however, I thought one instance where it would be very handy (apart from barcode UPC searches). If you were traveling in a foreign country and the only thing you had was the image of the item, and not the language to describe it to a search engine.

This strikes me as serving a similar market. It may not be incredibly useful to an English speaking person, since so much of the web is already in English, but it might be a sign that Google is anticipating a huge syrge in the non-English web.
posted by codacorolla at 11:16 AM on December 14, 2009


The term “unparalleled insights,” while admittedly inside a quotation from the original, is a bit rich.

Additionally: “Every language”? Even, say, Mongolian, Yi, and Ojibwa?
posted by joeclark at 11:50 AM on December 14, 2009


The term “unparalleled insights,” while admittedly inside a quotation from the original, is a bit rich.

I'd say quite a bit more than a bit rich.

Google are working on it.

Is this subject-verb agreement correct in British English? I have a hard time believing that.
posted by mrgrimm at 12:11 PM on December 14, 2009


1. ????
2. ????
3. PROFIT!


In fairness, this has been Google's entire business plan from day one and it's worked for them so far.
posted by mightygodking at 12:33 PM on December 14, 2009 [1 favorite]


Is this subject-verb agreement correct in British English? I have a hard time believing that.

Are you picking up that it should be "Google is working on it"? ;)
posted by Lleyam at 12:42 PM on December 14, 2009


With good statistical algorithms, and lots of feedback channels from users to tweak things, I shouldn't be a bit surprised if the Goggles process ends up telling us about those aspects of our visual experience that are shared, and those that are contingent. In other words, we will learn from the evolving use of the algorithm: what do people use it for?

Same thing with the language/translation issue. We have moved to data driven models and have also had huge advances in machine learning techniques, including building in rich feedback, and as a result, we are learning from user behavior. It doesn't have to be "good", it just has to be useful enough and it can become really useful for seeing our commonalities.
posted by fcummins at 1:26 PM on December 14, 2009


Additionally: “Every language”? Even, say, Mongolian, Yi, and Ojibwa?

If enough of those languages make it onto the web, maybe.
posted by Blazecock Pileon at 1:42 PM on December 14, 2009


Every time I see this kind of thing, my heart skips a beat, because I'm a translator. Then I realize that during the time when MT is just good enough to let people find stuff in other languages, but not yet good enough to provide a translation that sounds professional, I am going to get Utterly. Stinking. Rich.

Incidentally, people still think they can cut out the translator, run their stuff through a machine translation tool, then only pay editing rates to have it fixed up. It doesn't work. Post-editing machine translated stuff is miserable, horrible work that is honestly often more work than just translating the damn stuff clean in the first place.

I haven't done anything with these more recent translation tools, though, so things might be getting better. I oughta try some stuff out - but I've been too busy with old-fashioned human translation (which, by the way, isn't what it used to be, either; computer support is absolutely a given nowadays - by which I mean not just that everything's in documents on the computer, but that once you've translated a sentence, your database remembers it for later use).
posted by Michael Roberts at 2:40 PM on December 14, 2009 [1 favorite]


For English speakers: Google translated the French sentence "He is still fat." to the Spanish sentence "It is always good."

The machine still has difficulty differentiating between 'fat' and 'phat'. This is what's known as the Flava Flav Bug.
posted by mannequito at 2:47 PM on December 14, 2009


Don't forget that Google just ingested the world's books. That means they have very high quality translations of many works across many languages to train their engines on. Contemplate for a minute, the raw power of having five to ten copies of each book in a different language-- a modern Rosetta stone.

The single most insightful thing I've seen on MeTa. Not bad, fake.

Statistical correlation is going to work, because it's what kids (with a universal grammar framework) appear to do. That it's yielding decent results now, with our horribly naive technology, is already impressive. This is a CPU bound problem and we have lots of CPU.
posted by effugas at 3:10 PM on December 14, 2009 [1 favorite]


If Google could provide machine translation that worked well enough for this search thing to be useful, search would be the tiniest fraction of their income.

Would it? Presumably the technology itself wouldn't be that hard to replicate, if not immediately then within a few years.

Also, the fact that people have had trouble building machine languages using tiny corpuses on desktop PCs from 10 years ago is hardly relevant to what Google is doing today, based simply on the resource level. Far more data and FAR more computer power.

For example, here's the opening paragraph translated from English to Dutch and back:
An Omnivorous Google Is Coming. "Imagine what it would be if a tool is built into the engine that my search is translated into any language and then searched the websites of the world," she says. "And then called on the translation software half and third time- not only results in your own language, but the translated sites in full when you clicked." Marissa Mayer, Google's Vice President for search products and user experience, shares her unparalleled insights into the future of Internet search engines.
The only mistake is converting 'second' to 'half'. Everything else is perfectly sensible.

Here it is from English -> Icelandic -> English
An Omnivorous Google comes. "Imagine what it would be like if it was a tool built search engine that mean my every language, and then search websites around the world," she says. "And then invoked the translation software and other third time - to not only present the results of your language, but then translate the pages in full when you click through." Marissa Mayer, Vice President for Google search products and user experience, shares her óviðjafnanlega insight into the future Internet search engines.
Not too bad. Here it is with Hindi:
Google is coming to a carnivore. Imagine "what it would be like if there is a search engine in every language translation of your search query and then search the websites of the world was built equipment," he says. "And not only results in his own native language translation software currently one second and the third time - is applicable, but in full when you click through the translation sites." Marissa Mayer, Google's vice-president for search products and user experience, shares her unique insight into the future of Internet search engines.
Makes less sense, but you still get most of the jist.
posted by delmoi at 3:10 PM on December 14, 2009


using tiny corpuses

Or is that 'corpii'
posted by delmoi at 3:12 PM on December 14, 2009


Oh, and speaking of crazy-ass search technology, check out google goggles
posted by delmoi at 3:13 PM on December 14, 2009


“Every language”? Even, say, Mongolian, Yi, and Ojibwa?
If enough of those languages make it onto the web, maybe.
So, Klingon first, then?
posted by Flunkie at 4:28 PM on December 14, 2009


I think blacklite has it.
posted by flabdablet at 6:03 PM on December 14, 2009


There's a flaw in these English -> language x -> English tests that people are running, which (as someone thought it might) shows up when you try to read the text in language x.

Several people have spotted the strange non-translation of Google Is Coming: perhaps the programme assuming that the capitals mean it's a title or a proper noun. It did what vacapinta pointed out, with hardy-har-har: left it untranslated, and an undefined token (or whatever the term is).

What hasn't been pointed out--on a slightly skimmed reading of this thread, anyway--is that it's doing something similar with the English -> language x -> English translations people are running. Flunkie: the French text you get is packed full of basic grammatical mistakes, and is also very awkward in style. To take one example: in the English, the word 'invoked' is a simple past tense verb form ("and then [the search engine] invoked"), albeit one being used subjunctively (grammarians, correct my terminology please). The correct translation in this context would be "et ensuite invoquait". But Google gives "invoqué", which is the past participle ("I have invoked"). It does this because the subject of the verb is lost in an earlier segment of recorded speech: the translatorbot can't be sure which preceding noun or pronoun is the subject, so it is translating "invoked" as a single word and guessing at "invoqué". But this error disappears when you translate the French text back into English, because "invoqué" goes back to "invoked"--which happens to look right in English, because the simple past and the past participle are the same. But until the translatorbot can get it right in French ("invoquait") and still give it back in English as "invoked" (and not the typical French-speaker error "was invoking"--also a correct translation for "invoquait", but not here), then we're way off having something that can translate correctly and fluently. ("Way off", when translated into Googlish, meaning "Eighteen months from now".)

The same point applies to many different parts of this text, where there are slips of tense, number, gender. They go into bad French but then come back into okay English. The verb "present" bizarrely gets put into the plural ("présentent"), because the "to" marking is as an infinitive is split off by a couple of other words ("to not only then present"). This makes it confusing to say the least in French--but it comes back into English as correct, because the verb form happens to be the same between the third person plural of the present tense ("they present") and the infinitive ("to present"). The translatorbot doesn't know that Marissa is a girl, either--though humans often make that mistake when translating from a language where nouns and verbs have no gender, like English, to one where they do.

So--the English -> language x -> English test doesn't work. It puts an English text into bad French then back into less bad English, because all the things it's mistranslated into French happen to fall back in roughly the right order, so to speak, in English. Working between two languages with gender in nouns and verbs, or between two languages that differed more as to syntax, would highlight this problem.

This is not to say that you can't already make some headway with the gist of the text--it's better than nothing, and many of its advantages have been pointed out already. But they won't improve their product by sharpening their algorithms with this particular test.
posted by lapsangsouchong at 8:03 PM on December 14, 2009 [2 favorites]


This is what's known as the Flava Flav Bug.

Don't you mean the Phlava Phlav Bug?
posted by readyfreddy at 11:08 PM on December 14, 2009


just out of interest, following the good point well made about round trip translation not being a good indicator of quality, are there any fluent Japanese or Chinese speakers who can tell us if those translations are as bad as they appear in this thread?
posted by Wrinkled Stumpskin at 1:48 AM on December 15, 2009


« Older French politics lip dub   |   Damon, Carlton & A Polar Bear (and some art) Newer »


This thread has been archived and is closed to new comments