Google Translate Toolkit
June 9, 2009 12:22 PM   Subscribe

Google Translate Toolkit is a new webapp from Google to help translate webpages. Video demonstration (1:30s). It has built-in support for Wikipedia and Jay Walsh thinks it "may change the way Wikipedia grows in other languages".1
posted by stbalbach (29 comments total) 6 users marked this as a favorite
 
I've always thought that Google was going to be first with the game-changing translation engine (with MS Research having an outside chance of beating them).

It's a hard problem of course but the payoff is arguably a more significant tech advance than the web itself since a web where everyone can read everything is an order of magnitude more useful.
posted by @troy at 12:30 PM on June 9, 2009


Somewhere, an aging Esperantist holds his copy of "Kastelo de Prolongo," and weeps softy.
posted by The Whelk at 12:54 PM on June 9, 2009 [5 favorites]


As a translator, I still laugh at the Google engine's output quality. (While planning carefully for the day my entire industry is obsolete.)

Actually, as things stand now, the existence of the Google engine (and others) is a great help for the translation industry. They're good enough to foster contact between language regions, while not good enough to do the heavy lifting of translating any text you really need to be halfway good. The end result is more demand for human translation.

As to translation of Wikipedia -- what I wish existed were some principled way to jump between Wikipedia pages in different languages which address the same topic. If, in addition, there were some principled effort to reconcile the articles in different languages, it would be nice. To me, that has always seemed a no-brainer, but to Wikipedia, I guess not.
posted by Michael Roberts at 1:11 PM on June 9, 2009 [4 favorites]


Is their machine translation any better than normal? I'm not good enough in any other languages to even judge the quality. I tried using it to translate something in German back to English but it doesn't seem to go that way.
posted by smackfu at 1:11 PM on June 9, 2009


I've always thought that Google was going to be first with the game-changing translation engine (with MS Research having an outside chance of beating them).

Google has had a translation system out there for a long time. And it's always been able to translate whole web pages. So I don't really see how this qualifies as a piercing insight. This is just a more convenient way to use it.
posted by delmoi at 1:18 PM on June 9, 2009


what I wish existed were some principled way to jump between Wikipedia pages in different languages which address the same topic

You mean, besides the toolbar at the lower left that lists all the other-language versions of the current article?
posted by The Tensor at 1:19 PM on June 9, 2009


This is more of a collaboration tool for human translators, right? At least that's how I interpreted it.
posted by roll truck roll at 1:20 PM on June 9, 2009


As a collaboration tool for human translators, it could be really nifty.

Its machine translations leave a lot to be desired -- I tried it on Wikipedia's article on the Hudson river.
It begins in the Adirondack Mountains, flows past the Capital District, and then forms the border between New York City and New Jersey at its mouth before emptying into the Upper New York Bay.
Turned into:
これは、 アディロンダック山地では、開始の流れは、 資本金区過去、その口の中で、 ニューヨークとニュージャージーの境界の書式はアッパーニューヨーク湾に排出する前に。[ 9 ] in Japanese.
This doesn't make a whole lot of sense, but more than that -- "Capital" in "Capital District" comes out as "capital" as in "capitalism"; "forms" turns into "forms" as in documents that you fill out. I didn't expect anything more than that, but I guess some tiny part of me held out hope that Our Google Overlords had somehow managed to conquer the problems of machine translation.
posted by Jeanne at 1:40 PM on June 9, 2009


This is their attempt to get humans to do it. They want you to put in the correct translation, and then they can store that in their database as a sentence level translation, and in theory, learn from it.
posted by smackfu at 1:47 PM on June 9, 2009 [1 favorite]


The automatic translation quality isn't wonderful, at least into Russian. The usual suspects - tripping up with word order versus grammatical functions of words, and translating a word wrongly due to lack of context awareness: it doesn't know if "display" is a noun or verb. Also plenty of straight-up grammatical mistakes.

As a tool for translators, it seems similar to other CAT tools. With two exceptions. The shared translation memory, and the ease of collaboration. The power of harnessing thousands of self-interested people together could make this awesome. But I doubt this particular path is going to lead to fully-automated good translation.
posted by Wrinkled Stumpskin at 1:53 PM on June 9, 2009


To me, that has always seemed a no-brainer, but to Wikipedia, I guess not.

I think it would actually be pretty difficult, especially when you consider how wikis get created in the first place. Originally, new articles would appear on a whim whenever anyone felt like creating it. Keeping the article topics the same across languages would have been really difficult during the 'big bang' when the site first started. And it would have been an administrative nightmare, since you would have to collaborate with people in other languages to decide what topics should be in the database.

I don't really think that qualifies as a 'no-brainier' Maybe now it would be more useful, but at this point, from a technical and 'political' standpoint it would be a huge pain in the ass.
posted by delmoi at 2:04 PM on June 9, 2009


As far as i can make out you can only use the toolkit to translate from English into other languages for the moment. Would be great if it was mutlidirectional.
posted by Lezzles at 2:26 PM on June 9, 2009


The power of harnessing thousands of self-interested people together could make this awesome. But I doubt this particular path is going to lead to fully-automated good translation.

If it's sufficiently useful as a collaboration tool, then they'll get a lot of raw data. I've been looking for a collaboration tool, so this post really caught my attention.

My uneducated hunch is that lack of raw data isn't the problem so much as developing useful algorithms for analyzing that data and incorporating it into a mechanical translation system. That's where I think the big money will be over the next several years.

Which is all to say, hey linguists! Learn to code!
posted by roll truck roll at 2:31 PM on June 9, 2009


I've use Google Translate for simple email correspondence in Russian. I find that when I go through a few iterations from English to Russian to English ... that I can usually converge pretty quickly on a simple Russian sentence that means what I intend it to mean.

But, no, it won't do a perfect translation in one shot; you have to work with it and lower your expectations.
posted by ZenMasterThis at 2:38 PM on June 9, 2009


I don't really think that qualifies as a 'no-brainier' Maybe now it would be more useful, but at this point, from a technical and 'political' standpoint it would be a huge pain in the ass.

It's also no fun.
posted by smackfu at 2:39 PM on June 9, 2009


The recent Google Wave video suggests that Google's translation system depends on a statistical model generated from a large body of example information. By providing something like this, Google get an excellent way of training their statistical model with high-quality examples before people start truly stress-testing it (i.e., in unscripted real-time conversations).
posted by acb at 2:53 PM on June 9, 2009


Michael Roberts: If I understand you correctly, there is a way to pivot between Wikipedia articles on the same topic in different languages — see the "languages" box in the bottom of the left sidebar on most articles (such as Harp). These interwiki links are maintained by humans and their bots.

There are also efforts to reconcile articles in different languages. Translation isn't the most organized Wikipedia project; it could probably use some more volunteers.
posted by dreamyshade at 2:58 PM on June 9, 2009


I strongly suspect that the Google approach can't possibly work, and that real machine translation is just as hard as real AI. It's the homonyms.

I can imagine language A with SOV (subject object verb) word order and language B with SVO. And completely different vocabularies with 1-1 correspondence. And those are the only differences. Mechanical translation would be trivial.

In the real world, let's take the word great, which can mean "good" or "large" in English, and doesn't have an exact equivalent in some other language.

Two examples.

"We're still trying to resolve the differences between quantum mechanics and relativity. It's a great problem!"
"It really tests your mathematical intuition and your understanding of posterior probabilities. It's a great problem!"

Identical sentences (OK, slightly contrived, this is a proof of concept). I can't see how a system could translate that without understanding, really understanding, what you were talking about. It certainly couldn't do it through statistical analysis a la Google.

There are plenty of other difficulties and non-trivial things, but I think that alone is a deal-breaker.
posted by Wrinkled Stumpskin at 3:00 PM on June 9, 2009


Weird, I just started playing with the Google Translation API this afternoon for my Twitcaps application (http://twitcaps.com). Pardon the plug...

I am using it to translate tweets to/from 8 different languages. So far I'm not too happy with the results, but that may say as much about the nature of tweets (structure, grammar, slang) as it does the quality of Google's translation.
posted by BoatMeme at 3:00 PM on June 9, 2009


The recent Google Wave video suggests that Google's translation system depends on a statistical model generated from a large body of example information.

It's funny how you hear a lot of talk about Google Research and their fancy translation systems. But then what they seem to make available is one step up from word-for-word dictionary lookups that are equal to Babelfish of 10 years ago.
posted by smackfu at 3:12 PM on June 9, 2009


Pretty sure this is an urban legend, but there's an old canard about an American diplomat who wanted to say to a Russian "The spirit is willing, but the flesh is weak." After punching this into the translator, he wound up with "the wine is nice, but the meat is awful."
posted by Schlimmbesserung at 4:08 PM on June 9, 2009


Wrinkled Stumpskin: You know, I don't think I'm 100% on which is which either. In the second case, it's fairly obvious that you mean "great" as in "good". In the first case, though, I feel it's kind of a toss-up, especially since "great" seems like a clumsy choice of words if you meant "large" in that case (which I guess you did, since you were going to present examples of both meanings).
posted by Joakim Ziegler at 4:35 PM on June 9, 2009


Bah. Great idea, of course, but sadly another case of hype that belies a poorly executed Google product.

I tried to upload a single document for translation, of less than the 1MB limit, and I got this message "You have exceed your total document space limits for the year." No explanation, nothing in the help files about how one might have "exceed" one's space limits (for the year?) and what to do about it. Lack of proper grammar in the messages of a language app just rubs salt in the wound. Bah.
posted by dacoit at 5:29 PM on June 9, 2009


By the way, Wikipedia articles in different languages are not straight translations of each other. Often they are written from scratch by different contributors. I.e. I wrote a totally new Norwegian article about amphibious warfare, since the English one wasn't very good.
posted by Harald74 at 12:05 AM on June 10, 2009


Also, the German wikipedia is almost a completely separate project. I'm sure they would be quite offended if it was implied they were just translations.
posted by smackfu at 6:03 AM on June 10, 2009


Also, the German wikipedia is almost a completely separate project. I'm sure they would be quite offended if it was implied they were just translations.

Yes, the idea that the different Wikipedias are or should be translations of one another is ridiculous, but what do you mean by "the German wikipedia is almost a completely separate project"? I'm a contributor to it in a very small way, and as far as I can see it's just like any other version of Wikipedia.
posted by languagehat at 8:49 AM on June 10, 2009


The Wikipedia page on German Wikipedia covers it. I'm mainly thinking of the things under Characteristics, in particular that the German Wikipedia would rather have no article on a subject than a poor one, and have much harder rules on what is encyclopedia-worthy. So they wouldn't even want "just translations" if they don't meet their standards.

I suppose different rules/standards don't make it a completely separate project, but they certainly do seem to be doing their own thing.
posted by smackfu at 8:58 AM on June 10, 2009


The machine-translation feature of Google Translate is not what's interesting. It's the computer-assisted translation feature. CAT tools have been around for a while, including some web-based ones, but having Google get behind one is interesting. It looks like they'll be using everyone's corpus of human-translated segments to feed into everyone else's translations, which would put it off-limits for any NDA'd work (ie, all the work I do).

As to Wikipedia: I understand the desire to maintain some kind of parallelism between the different language editions of Wikipedia. At a structural level, this would be a problem, since new articles, categories, etc, are being created all the time in every language. They may overlap somewhat. Should one language edition be taken as the template for all the others? If there's an extensive exploration of Dr Pepper ripoffs on the English wikipedia (there is), does that need to be replicated in the Farsi one? And how would you accommodate Farsi entries about whatever beverages are popular in Iran?

Beyond that, I've come to think of the current adhocracy approach to translating across Wikipedia as a strength. I created an English-language article by translating the Japanese. The English-language article got augmented by others, and some of those changes wound up percolating back into the Japanese version. I translated another article from Japanese to English that wound up being used as a basis for entries in several Romance-language versions of Wikipedia, because the English version was more accessible to the French and Spanish editors than the Japanese version. Conversely, I've seen Wikipedia templates that originated in French adapted into the English version.
posted by adamrice at 1:36 PM on June 10, 2009


The comment about NDA stuff made me look up the terms-of-service:
By submitting your content through the Service, you grant Google the permission to use your content permanently to promote, improve or offer the Services. If Google publicly displays any of the content you submitted through the Service, Google will display only portion(s) and not the entirety of the content at one time.
posted by smackfu at 1:51 PM on June 10, 2009


« Older An Inconvenient Hoof   |   The rise and fall of El Pollo Justiciero! Newer »


This thread has been archived and is closed to new comments