Machine Traslation
May 15, 2003 4:52 AM   Subscribe

Is really effective machine traslation just around the corner? Up 'til now computerized language translation has been as amusing as it been useful. Getting the gist of text composed in a different language has been about the most one can hope for. Will this company's efforts with statistical analysis be the breakthrough? Statistical analysis might be the key to stopping spam too. This is changing the way I think about my own communication.
posted by bendybendy (13 comments total)
Journalist finds "downright nutty" person without formal training and who spews nonsense. The CEO, trying to translate, spews more nonsense. Then she interviews a third person, also with the company, who says that the nutter's approach is refreshing.

Hmm. Sounds like a breakthrough to me.
posted by ptermit at 5:06 AM on May 15, 2003

Is really effective machine traslation just around the corner?.

As a computational linguist who has worked on the development of two different MT systems, my answr is 'No'.

With a dozen employees, his New York-based company, Meaningful Machines, is peddling its ideas to the National Security Agency and several big software companies, claiming they can help catch terrorists and open the world's cultures to faster exchanges of ideas.

MT research and technology has been in existence for 30-40 years now, and has received most of its funding from the U.S. government--back in the day, to fight the cold war (one system I worked on did Vietnamese when it was a threat, then Farsi, etc.). Now he's trying to peddle it to fight terrorists. Same old same old.
posted by tippiedog at 5:19 AM on May 15, 2003

machine 'traslation' will never be effective until speel chukking is perfected.
posted by quonsar at 5:32 AM on May 15, 2003

I don't know much about the subject except for the hilarious results I've seen going from English to Korean. Based on that, I would say we have years to go to get out of the embryo stage.
posted by Plunge at 6:07 AM on May 15, 2003

No. While the idea of statistical modeling of a huge corpus of articles in two languages sound good, doesn't he know that there is no such thing as a perfect translation.
posted by zpousman at 6:38 AM on May 15, 2003

Without more detail, it seems that they're using latent semantic analysis (PDF) on an n-gram level as opposed to the individual term level. Landauer really sells LSA/LSI and it seems to do a pretty good job in information retrieval systems, but it is a big stretch to call what it does is akin to understanding. What I want to know is how much storage do they need to hold all of their n-gram matrices. My tiny project (one matrix of 240 documents and ~3600 terms) weighed in at around 65MB.
posted by Fezboy! at 6:50 AM on May 15, 2003

engram matrices! Fezboy!'s a scientologist! get him!
posted by quonsar at 7:01 AM on May 15, 2003

This posting from English to German to French and back to English via the Google Translator:

Effective truth Maschinentraslation * right around the corner? ', until now of the translation automated to be is maintained, like, it be usefully. This gist of the text to receive in another language is existed the majority on those can hopes be. Are efforts of this company with the statistical analysis the opening? The statistical analysis could also have stopped the key with Spam. This one modifies the manner that I my own communication thinks above.

There is a website that does this automatically, but I'm too lazy to find it.
posted by ralawrence at 7:07 AM on May 15, 2003

frantically checking for the next flight to Clearwater

Ron is so going to kick my ass. I worked so hard to become the agent posted to Metafilter too...

posted by Fezboy! at 7:20 AM on May 15, 2003

Dunno. I use Bayesian statistical method filters as a tool on my server, and naive Bayesian for non-server mail. The naive Bayesian filtering system (Plug: POPFile) runs better than 95% accuracy and most of that remaining percentage is difficulty in my training it after the fact to distinguish adult spam from non-adult. May not work well as a translation tool, but definitely good for us admins.
posted by Samizdata at 7:33 AM on May 15, 2003

PLease god never get rid of google translate, it's the best thing ever.

I recently had to check a French article on sixties je-je singer France Gall, and according to the google translation of her biog, "in 1968 it won the eurovision song contest and it became an international high speed motorboat".

This is what the internet was made for.
posted by ciderwoman at 7:45 AM on May 15, 2003

FYI, Abir's patent applications are here. I only skimmed the abstracts, I am not an expert etc., but they look interesting, if they do what they say they do. Most of them seem to describe huge thesauri, with metadata and/or algorithms that describe the probabilities of phrase composition within, and the probabilities of phrase equivalence across, languages (well storage is cheap). Which is kind of interesting, because instead of reducing everything to a common grammatical structure, converting it, and reassembling it in another language, they appear to be looking at statistically probable equivalents. Guess the trick is then to write the algorithms to derive the statistical tables that describe the relationships, assuming that the relationships exist (Abir seems to think they do). I suppose with small databases it will do worse than existing machine translation methods; but the larger the databases, plus the attendant description of probable relationships between words and phrases within and between languages, the better it will get (assuming they have the relationships right), and at some point it could be better than machine translation.

Fezboy!, I've used a form of LSA (not Landauer's) at a fairly superficial level, analysing written texts (reports etc.) and also transcribed natural interaction. What I found it to be good at is tracking the clusters of words (= concepts) that people use consistently throughout their speech; also indicating whether people think connectively (i.e. all the things they talk about are linked with all the other things they talk about), or in siloes (i.e. they only talk about one thing at a time); and who is thinking about the same things as other people.
posted by carter at 7:52 AM on May 15, 2003

Wittgenstein says no, while Benjamin Lee-Whorff says - "Do you even understand the question you're asking"?
posted by troutfishing at 8:57 AM on May 15, 2003

« Older Original American Life Video Sills   |   What is it Good For? Newer »

This thread has been archived and is closed to new comments