The Shallowness of Google Translate
February 20, 2018 6:58 PM   Subscribe

 
On the one hand I don't really think I can disagree with any particular thing he's saying. On the other, he seems to be desperately defending the ever-shrinking island labeled "things only humans can do" and resorting to some pretty spectacular goalpost-shifting to do it. Machine translation needs to be profound and rival Pushkin: okay, got it. That would be great! Meanwhile where we're at right now is way beyond the wildest imagination of the earliest days of AI.

I wish I could find/remember the source of a sentiment that came up in another of these threads - it's AI until you succeed; after that it's just another thing computers can do.
posted by range at 7:17 PM on February 20, 2018 [24 favorites]


I think of it less as goalpost shifting, and more as a process of understanding what intelligence really is. I mean, if you've never seen a machine translate literature, you might think that that's a good place to draw the line. Now that we've seen it, and seen how it works (gradient descent, arriving at a huge matrix of random-looking numbers), does it seem like that was the result of intelligence, or statistics?
posted by claudius at 7:33 PM on February 20, 2018 [6 favorites]


I doubt Google Translate will be used to translate literary fiction, poetry, or diplomacy. Especially in settings where the resources exist to pay human translators. But I imagine it will suffice for street signs, technical documentation, and twitter...
posted by fencerjimmy at 7:36 PM on February 20, 2018 [1 favorite]


I think the essential point that Google Translate isn't really there is quite defensible, but his ultimate conclusion that it will never be there is wishful thinking.

Google Translate was perfectly adequate for my purposes, which was to gain an advantage in a social deduction game where players thought they could escape cheating detection by communicating with each other in German or Russian or Swedish on English language servers. I ran their texts through translate and figured out who the bad guys were on multiple occasions, winning the game for my team. For stuff like that, I don't need le mot juste. I just need the very basic gist.
posted by xyzzy at 7:39 PM on February 20, 2018


He knows better than to Chinese Room himself in to a corner.
posted by yesster at 7:42 PM on February 20, 2018 [10 favorites]


does it seem like that was the result of intelligence, or statistics?

Is there a difference? If I write soemthing and misspell a word, you still get the meaning, because the odds are I meant "something" based on your experience, your life's "sampling" of linguistic data. Is your intelligence really separable from the store of knowledge you've developed? Is it possible that we experience as "thought" is an emergent property of memory?
posted by SPrintF at 7:54 PM on February 20, 2018 [4 favorites]


Is there a difference?

I think you can try to make that case, but it's harder than just pointing to things we thought we couldn't get a machine to to, but actually now we can.
posted by claudius at 7:59 PM on February 20, 2018 [2 favorites]


He does come off as "get off my lawn" here, where in his younger days he would fervently defend the AI researchers.

Nonetheless, I think he's quite right: Google Translate is pretty terrible, especially when you leave the European languages behind.

AI researcher Rodney Brooks warns of the error of overestimating machines based on our experience with humans. A human who can translate from Chinese to English, or play chess at a really high level, is an extremely smart human. So when we see programs doing these things, we assume they must be as smart as a human would have to be to do them. But that doesn't follow. (If you're not sure why, read his other essay.)

This isn't to say that Google Translate is a bad thing. It's pretty neat, and often useful. It turns out that crappy but quick translation is good enough for many purposes. (And probably doesn't put translators out of a job: what it's best at is stuff no one would hire a translator for.)
posted by zompist at 8:02 PM on February 20, 2018 [15 favorites]


It turns out that crappy but quick translation is good enough for many purposes.

Worse is better.
posted by flabdablet at 8:07 PM on February 20, 2018 [6 favorites]


I can say from experience that there's a new AI-based translation engine in Japan that can handle fairly dry stuff (medical, legal, and financial fields) for what is, at current human translator prices, outrageously low prices. The translation agency I work for decided to try it out and compare it to a translation we'd done, and the results were honestly at least as good as what we produced, despite the usual difficulty of Japanese-to-English translation (i.e. Japanese doesn't have pronouns; you simply omit the antecedent when it is understood from context). Like, to the point where there is a temptation to simply outsource our own translation work in those fields to the AI translation engine, and then polish it up a little after it comes back, if necessary.

It's much less clear how soon machine translation will be able to handle less strictly regimented fields, but it's already been obvious for a while now that part of the job of any professional translator will soon also be to make a case for not simply having Google take care of it for free.
posted by DoctorFedora at 8:11 PM on February 20, 2018 [9 favorites]


He does come off as "get off my lawn" here, where in his younger days he would fervently defend the AI researchers.
Although I fully understand the fascination of trying to get machines to translate well, I am not in the least eager to see human translators replaced by inanimate machines. Indeed, the idea frightens and revolts me.
Hofstadter being who he is, I doubt he'd have a problem with human translators working alongside other sentient machines.
posted by flabdablet at 8:12 PM on February 20, 2018 [2 favorites]


He does come off as "get off my lawn" here, where in his younger days he would fervently defend the AI researchers.

Twenty years ago, in Le Ton Beau de Marot, in between arguments for there being no one correct translation for any reasonably complex piece of writing, he was arguing that there are no circumstances under which it is appropriate to play rock and roll at a funeral, so that's hardly surprising. I would think by now he's entitled a modicum of stodge.
posted by darksasami at 8:14 PM on February 20, 2018 [5 favorites]


Use the software to do a 1st draft but for god's sake have a knowledgeable human reread it and correct it. Nothing says I don't care about you like auto-translation.
posted by WaterAndPixels at 8:34 PM on February 20, 2018 [2 favorites]


I've found Google Translate useful enough on occasion, but the last time I tried to use machine translation for something non-trivial, it was auto-generated youtube subtitles for commentary by Cho Chikun on a game of Go. I can't find that video now, but here is a typical bit of auto-translated commentary from another randomly selected Japansese Go video:
"But I think the influence of what floor of the room is recently that have been struck in this / Hey take a behind Decca black is thought the variable Hey / Underneath I hope we entered x damage and 3-year-old It is here scissors / Although that the lower-left or is a form to say that child and is often some form of you and protect it well little"
You can see the words "black", "influence", and "lower-left" in there; those make sense as things you might mention in talking about the game. And I'm guessing that by "scissors" they mean what we would normally call a pincer. Although it's still completely incomprehensible, that makes this translation quite a lot better than they did for the words of Cho Chikun. Perhaps it's his accent, or something else about the way he speaks, but I don't think the auto-translate in his case managed to even get an above-statistical-background level of Go-related words in. It was surreal widget squicky the in the infundibula with reading monorail us all the way.
posted by sfenders at 8:48 PM on February 20, 2018 [3 favorites]


Having recently used google translate for German paperwork and grocery shopping, I can say it was only slightly helpful. Translating from German or English was much better than the other way around. For example, I needed face wash but none of the bottles used the word google gave me for that. I had to instead translate the words on the bottle to English, which is really annoying and time consuming.

I mean, it’s better than nothing and 15 years ago I would have been in a much worse position because translate apps didn’t exist at all, but it’s nowhere near as good as it would have been to have a German speaker with me instead.
posted by LizBoBiz at 8:57 PM on February 20, 2018


I went into the article expecting to heartily agree with him, but he somehow managed to write about how Google Translate is completely unlike a human mind thinking about language, by anthromorphizing it into another person who is less skilled at translating than he is and making fun of that person.

But... this is a cognitive scientist who has presumably been thinking about this stuff for his entire career, the Gödel, Escher, Bach guy, not just some professional translator with a chip on his shoulder? I'm kind of surprised that he didn't provide a more enlightening analysis of what Google Translate actually is and how it works.

I've never picked up that book, only heard people talk about it... is it really any good, or should I wait until a machine learning algorithm can write me a bespoke piece on the same subjects?
posted by XMLicious at 8:59 PM on February 20, 2018


I've never picked up that book, only heard people talk about it... is it really any good

It's a romp. Well worth your time.
posted by flabdablet at 9:02 PM on February 20, 2018 [17 favorites]


Yeah, I thought he came across as a little cranky. He says:

'There is even a school of philosophers who claim computers could never “have semantics”...'

I wish he had gone further on this point - the Semantic Web has been around for a while and is attempt to define and link data at the conceptual level - there would be a defined concept for "South Study special aide" (identified by URI) and links to/from this concept. And then (insert more magic here) one can run deep learning algorithms and sophisticated queries across this net (at a 'level up' conceptually from working with raw text as it is now) - this seems to me that it might lend improvement to some of the problems he points out.

The folks at dbpedia are working to convert wikipedia into a semantic knowledge graph - it's some cool stuff.
posted by parki at 9:12 PM on February 20, 2018 [4 favorites]


Seconding GEB.
posted by parki at 9:13 PM on February 20, 2018 [6 favorites]


where we're at right now is way beyond the wildest imagination of the earliest days of AI.

Nah. The very earliest imaginings entertained robotic minds like those of human beings and that is still extremely remote, in some respects as remote as it ever was.

Nothing whatever is beyond the wildest imagination of some enthusiasts. Godhead and immortality are just entry-level stuff to some.
posted by Segundus at 9:15 PM on February 20, 2018 [11 favorites]


> parki:
"Yeah, I thought he came across as a little cranky. He says:

'There is even a school of philosophers who claim computers could never “have semantics”...'

I wish he had gone further on this point - the Semantic Web has been around for a while and is attempt to define and link data at the conceptual level - there would be a defined concept for "South Study special aide" (identified by URI) and links to/from this concept. And then (insert more magic here) one can run deep learning algorithms and sophisticated queries across this net (at a 'level up' conceptually from working with raw text as it is now) - this seems to me that it might lend improvement to some of the problems he points out.

The folks at dbpedia are working to convert wikipedia into a semantic knowledge graph - it's some cool stuff."


You might also want to take a look at Nell at Carnegie Mellon. I have been following the project for a while and it is getting MUCH smarter over that time.
posted by Samizdata at 9:28 PM on February 20, 2018 [1 favorite]


Thirding GEB
posted by saltbush and olive at 9:37 PM on February 20, 2018 [3 favorites]


For example, I needed face wash but none of the bottles used the word google gave me for that. I had to instead translate the words on the bottle to English, which is really annoying and time consuming.


When I was doing this, I used the google translate app, which lets you point the phone camera at printed text and will read and translate it from there.
posted by the agents of KAOS at 9:51 PM on February 20, 2018 [4 favorites]


Fourthing GEB.
posted by flippant at 9:57 PM on February 20, 2018 [4 favorites]


One good question to ask when trying to figure out how "smart" google translate is would be to compare it to the best simple and dumb implementation. For example how would it fair against an automated dictionary that just replaced words, or common phrases?

I mean, honestly it does a little better. But not fantastically, unsettlingly better. But... it seems like we might be nearing the point where such things could...

I mean the real thing is that with the neural network stuff, we no longer quite understand how the things get the results we are getting. Statistical models may be complicated and you may make a bunch of assumptions you know are wrong to one degree or another to get to a pretty good answer, but it's at least pretty well mapped out what is going on when you use them. We... just now have a very powerful tool and limited understanding of when and where and why it succeeds and how and when it fails.
posted by Zalzidrax at 9:59 PM on February 20, 2018


Without undersanding there is no hope for a smooth bottom.
posted by fairmettle at 10:47 PM on February 20, 2018 [3 favorites]


Zalzidrax, have you actually tried this experiment? There are a bunch of language pairs where I'd characterize the difference as "fantastically, unsettlingly better". And if say that's been the case since the 1990s! Naive word substitution is very often utter nonsense. As a straw man, it's barely compelling.

I think Doug has absolutely earned the right to be a curmudgeon on this topic and the Google Translate team has absolutely earned the right to ignore him and say he's criticizing them for doing a bad job solving a problem they aren't trying to solve (I'll add that, to my knowledge, Google hasn't commented on this article). I have tremendous respect for all parties here. But I think there's no question they are working at cross purposes.
posted by potrzebie at 11:06 PM on February 20, 2018 [2 favorites]


not to derail too far into "speaking of google translate" but, i was thinking about google translate just the other night and how fundamentally flawed the premise of how it operates is. speaking and writing are inherently rife with ambiguity. how do we clear things up? we ask questions! i feel like google translate still adheres to some deep-seated GOFAI mindset according to which learning, questioning, and ambiguity are bugs that can be eliminated through better Logic and algorithms and not constitutive features of language and "intelligence" themselves. i don't see how it can ever hope to go beyond being "not bad compared to babelfish" if it can't formulate relevant questions to better ascertain necessary contextual information ?
posted by LeviQayin at 11:09 PM on February 20, 2018 [5 favorites]


There is even a school of philosophers who claim computers could never “have semantics” because they’re made of “the wrong stuff” (silicon).

It’s more that you can’t get semantics from syntax. Chalk and cheese. Computation and syntactical manipulation deal only with the formal properties of symbols, never with the meanings. None of the proposals for constructing the latter out of the former look at all hopeful (faking meaning is another matter). Some big projects have cranking away for decades at the idea that if an encyclopaedia gets big enough it will just naturally begin to understand itself, but it’s no go.

The human mind naturally moves in a sea of meanings without even noticing the fact, so it is hard for many to grasp that computation is, in itself, meaningless. I’ve had so many fatiguing conversations with people from engineering or physics backgrounds who talk enthusiastically about feedback loops and black boxes, while it becomes clear they have no real inkling of the basic concept of intentionality, meaning, semantics, without which you cannot get started on the discussion. Typically they have spent their lives thinking of themselves as the most intelligent people in the room, so this is impossible for them to deal with. There may be brief moments of cognitive discomfort, then they decide it must be me that doesn’t understand, and the irrelevant jabbering about fucking loops resumes.
posted by Segundus at 11:26 PM on February 20, 2018 [13 favorites]


i was thinking about google translate just the other night and how fundamentally flawed the premise of how it operates is. speaking and writing are inherently rife with ambiguity. how do we clear things up? we ask questions!

It isn't even so much we ask questions, since that isn't always an option or, in cases where there isn't shared language, all that helpful. Instead we rely on inference, judging possible meanings from association and cues in the way language is used and our relationships with the person or thing we are interacting with. At a basic level a machine translation can provide real benefit, like in cases where there is no shared language so inference has to rely on nonverbal clues, when the speaker is present, or crude guesses based on the few words we do understand, mostly major nouns.

When I use Google translation for some web page that isn't in English, the gist of the info can be roughly understood because I'll supply the context to that information based on what it is I was looking for or the other clues present from the pictures on the page and like secondary contextual sources. Google improving their translation skills helps tremendously in making further inference easier, but the limits of that kind of exchange are also important to note.

Literature, for example, can require much more complex kinds of association and inference in making its meanings clear. Simplifying that kind of informational exchange removes layers of meaning which can leave what remains misunderstood or even in opposition to what was being communicated. On a simple level the problem some have with sarcasm points to that.

Increasing reliance on translation machines will require humans to simplify their use of language to make it come through in translation more clearly, and that can become part of a process of general "dumbing down" of communication overall as or if machines become more prevalent in our communications networks, either through translating or bot posts and so on. Asking machines to substitute for human exchange can provide useful service across great gulfs in understanding, but can potentially be harmful to communication across narrower divides.

None of this will stop machines from gaining ever more presence in our networks of communicating of course since there's one thing that's clear, given the chance humans are always going to seek ways to use or create intelligent being to do work for them for free regardless of the cost. The drive to create artificial human like intelligence without consideration for what that will mean for us or the intelligence we create just shows that's as true as ever.
posted by gusottertrout at 11:30 PM on February 20, 2018 [3 favorites]


I've got a research project with French and Australian researchers on it and our university is currently hammering out a contract with the French university. I learned yesterday that the French legal team has been writing comments on the drafts in French and our team has been using Google translate to translate them and inform their responses. I was so horrified they were risking this on a legal document that I asked for copies and translated all the comments for our team myself. Then I went back and stuck them into Google translate to see how bad the results they had previously been working from were.

To be honest, they were fine. There wasn't a single place where the difference between my translation and the automated one was significant enough to alter or obscure meaning. I was very surprised.
posted by lollusc at 12:01 AM on February 21, 2018 [6 favorites]


If you'd like to test the limits of Google Translate for yourself, have a look at Translation Party. It will translate back and fort between English and Japanese until it reaches equilibrium.

For instance, "Everyone had always said that John would be a preacher when he grew up, just like his father." ends up as "He is always father John Pastor" after some back-and-forth.
posted by Harald74 at 12:02 AM on February 21, 2018


Fifthing GEB.

I doubt Google Translate will be used to translate literary fiction, poetry, or diplomacy. Especially in settings where the resources exist to pay human translators.

Grumpy translator here: that's like saying "where the resources exist to pay live musicians." In other words, it's not just about the resources. The humanities have long been a playground for people who don't get them to get handwavy and say "[object] could do better" and it becomes all about the object versus an objectified/simplified othering of a skill rather than the discipline, practice, investment (on several levels), empathy, relationships, knowledge, experience, etc. that go into the human skill.

I don't have anything against auto-translation or AI in and of itself, heck I've managed AI in my own job, just it's also good to remember and value human skills as actual human skills that no automaton can ever replace. Note I did not say "mimic" but "replace". Music is a great example in that everyone here has listened to automatons play and replicate it, we all value that, and yet I doubt anyone would jump at the chance to go to a live concert where nothing but a recording is played.

Translation is an art as well as a science. Hofstadter is wonderful at writing about both.
posted by fraula at 12:16 AM on February 21, 2018 [8 favorites]


heck there are even those of us who enjoy live concerts where recordings are used by a human skillfully – it's called techno
but a concert where only a recording was played, well, we don't have that for a reason

posted by fraula at 12:20 AM on February 21, 2018


About the semantic web, it records semantics, but it doesn't itself "know" about meaning. RDF defines a syntax for expressing relationships between entities, but only a human knows what the records signify. All sorts of complicated queries can be made of a triple store, but at bottom it's just doing matching according to rules, there's no there there.
posted by i_am_joe's_spleen at 12:20 AM on February 21, 2018 [1 favorite]


A response from Language Log.
posted by koavf at 12:34 AM on February 21, 2018 [1 favorite]


This is like an old man fighting the cognitive dissonance between ideas he's built his entire career on and the manifest truth that machine translation is already quite excellent for plain everyday language.

From a comment in that article.

This makes me think of Clarke's first law: "When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong."
posted by Sebmojo at 12:45 AM on February 21, 2018 [2 favorites]


I had a browse of GEB recently and, honestly, the most dated stuff is the musings on AI. The rest of the stuff, musings on the foundations of logic, the mind and recursion, recursion in music and art - is still brilliant.
posted by vacapinta at 1:33 AM on February 21, 2018 [3 favorites]


Twenty years ago I lugged Le Ton Beau de Marot back and forth on the Eurostar whilst visiting my girlfriend in Paris, much to her amusement, so it will always have a place in my heart on that basis alone. But it is worth reading regardless of that - a romp through Hofstadter’s obsession with language and translation. I enjoyed it more than GEB at the time I think.
posted by pharm at 1:36 AM on February 21, 2018 [1 favorite]


This comment also recommends GEB
posted by IjonTichy at 2:22 AM on February 21, 2018 [8 favorites]


Counterpoint: I've bounced right off Gödel, Escher, Bach both times i've tried to read it, it just seemed so smug somehow and clever-clever-allusory when i just wanted to know what his big idea was.

(Which was super annoying, as i'm a huge fan of Escher and love what i've read of Hofstadter when he's talking about like ~Analogy as the Core of Cognition~, i just can't make any headway into that famous book. Probably i'm not musical enough.)
posted by ver at 2:40 AM on February 21, 2018 [3 favorites]


a concert where only a recording was played, well, we don't have that for a reason

Sure we do. They're called movies.

I paid good money to go to the Valhalla Cinema and see this in glorious Burkusurround.
posted by flabdablet at 2:40 AM on February 21, 2018 [1 favorite]


All Cretans like GEB.
posted by chavenet at 3:05 AM on February 21, 2018 [2 favorites]


A response from Language Log.

From the comments:
In _Godel, Escher, Bach_ Hofstadter proposes that you could change the encoding between DNA and amino acids, modify the whole genome accordingly, and everything would still work. Because of DNA-binding proteins which recognize specific sequences, as well as RNA transcripts (made from the DNA) which have various activities beside coding for protein, this is far from true. It's a lovely metaphor but not a physical reality. I am pretty sure this was already known when he wrote the book, though the extent to which it is true wasn't fully appreciated then–the more we know about DNA the more meddlesome it appears, with all sorts of behaviors besides encoding proteins. It's like a text where the words are meaningful, and the arrangement of text on the page also makes a meaningful picture.

It's funny that the idea in _GEB_ is sort of a machine-translation view of the genome, which is exactly what he dislikes when applied to language!
posted by XMLicious at 3:05 AM on February 21, 2018 [4 favorites]


As I procrastinate from a fairly tedious project, here's my long and personal rant about this issue.

I translate Japanese<>English, and when I worked in-house, my company was in the market for upping our capacity and lowering costs, which led us to test out some machine translation options. The source material we needed done was mostly ad copy/product descriptions or product reviews, and (un?)fortunately we were never able to square the costs of human+machine with the labor hours required to clean up the machine translated text.

Using machine translation engines to do the bulk of a project and then have human translators clean it up, or "post-edit" the target text, relies on the machine-translated text being very close to error-free, or passably suitable for the purpose of the text. This can work with relatively formulaic texts, but it absolutely did not work for our material. When we were testing it out, we felt like we were spending nearly the same amount of time rewriting or retranslating as we would have spent just translating the text from scratch ourselves. That turned out to be the case when we went back and looked at the ratio of translated segments that were revised vs. segments that were left as-is, which was almost 2:1. We were spending 2/3 of our time fixing machine-translated product and still having to review and check the segments that didn't need alteration, so any time we might have saved was negligible, plus the nominal cost of the machine translation service subscription. With product reviews, nearly 100% of the machine translated segments needed human intervention to be intelligible, let alone helpful to the users of the site.

Even worse, the machine-translated segments that could be left as-is tended to just be numbers in standalone segments. Think of something like the Japanese text reading "2ml。" and the translation engine producing "2ml." This didn't save us any time or money, and while almost-sort-of-kind-of acceptable results like "In slightly and the beige color, it enhances the color of nail color to wear later." aren't 100% nonsensical, they read like crap and don't produce any value for the reader or the company passing this information off to users, so we had to spend the time and keystrokes looking at that, reading the Japanese to make sure the machine translation engine was even on-target in the first place, and then editing the segment to say something like "With its subtle beige color, it enhances the color of the nail polish you apply afterward."

This workflow is not particularly efficient or useful, but unfortunately what it does do is convince clients that the human translator is doing less work or less difficult work, and therefore should cost less too. Sometimes that's the case, but the rise of language service providers offering cheaper translation options that rely on "post-editing" obscures the reality that post-editing can often mean just as much work for the translator, just with lower rates. Much lower rates. You don't find humans willing to do or capable of doing good work when rates get that low. Of course, this is good for translation clients and agencies that want to keep costs down, but it makes my life harder and it's not providing real product value, so I find it pretty disheartening.

Anyway, there are a million human translators who produce terrible work either out of ignorance, laziness, or haste, and their work is often just barely better than machine translated text, so I don't necessarily think "human translation" is the benchmark to shoot for. Maybe the quality of machine translation will approach that of humans eventually (it probably will, to some degree), but it's hard to imagine that capable human translators will be made obsolete by anything other than downward pressure on rates forcing them to seek other employment.
posted by wakannai at 3:18 AM on February 21, 2018 [9 favorites]


In _Godel, Escher, Bach_ Hofstadter proposes that you could change the encoding between DNA and amino acids, modify the whole genome accordingly, and everything would still work.

Doug Hofstadter had clearly never reverse-engineered Steve Wozniak's Apple II disk controller before putting forward that opinion.

Even some engineered designs rely on deep integration between structural levels. I would expect that kind of integration to be completely ubiquitous in evolved designs.
posted by flabdablet at 3:19 AM on February 21, 2018 [6 favorites]


Meanwhile, the Facebook translation is truly crappenshite.

This from a friend's feed:

Some days ago they were yellow in the John of God

Supposedly a translation from the Portuguese:

Ainda há dias andavam de bibe amarelo no João de Deus


"Just days ago they went around in yellow bibs at João de Deus"
posted by chavenet at 3:32 AM on February 21, 2018 [1 favorite]


Y'know what's just occurred to me... when I use Google Translate, I skim over the complete translation to find the paragraph or sentence containing the actual information I care about, but then I grab the original-language version of the paragraph or sentence and plug fragments of it into the main GT site, to see the differences in wording that come out. And then I'll often go further by plugging some of those fragments into the Google Web Search to find sentences in the original language that contain the same words or phrases, and plug those new ones back into GT, and maybe do a few lookups in English-language dictionaries for the original language.

Maybe part of Google's shortcoming lies in trying to deliver a normal narrative text translation in the target language. Perhaps a better approach would be to generate an interactive exploded sentence diagram or something like that, an explicitly "some assembly required" form that the would-be reader will need to absorb and process in a non-linear way but which may do a better job of conveying the original meaning, at least as far as something a computer system can produce automatically.

Do people here fluent in multiple languages, when examining text in a further source language you have no familiarity with, try to triangulate by getting automated translations to more than one target language?
posted by XMLicious at 3:55 AM on February 21, 2018 [1 favorite]


He owes Searle royalties.

Props for getting the Atlantic to publish the Chinese Room problem as if it were original though.
posted by PMdixon at 4:28 AM on February 21, 2018 [2 favorites]


While it’s clear that Google should be more transparent about the quality of translating between different languages (some pairs being better than others), I find the comments of “its terrible outside of European languages” to be like saying “My free, solar powered flying car can only travel 500 miles!” Like, it could be a lot better, sure. But I can’t read German or Italian very well and I find it amazing to read adequately translated articles for free, instantly.
posted by adrianhon at 4:34 AM on February 21, 2018 [1 favorite]


He owes Searle royalties.

that's....not how this works
posted by thelonius at 5:05 AM on February 21, 2018 [3 favorites]


He's not saying that Google Translate is not useful for certain purposes. He acknowledges that it is. I think Language Log kind of agrees with this (see clips below).

He's arguing about what he thinks translation fundamentally is. He is arguing this both in terms of outcome, and also in terms of how this outcome is achieved. For Hofstadter, translation is complex process embedded in our brain; and that, despite the adoption of technical terminology to the contrary, 'neural networks' are not models of how our brain works. Translation models are (currently) brute force machine approaches that produce a similar-ish outcome to cognition. It does not mean that they are cognition though. He's saying that the brain is way too complex to model (currently?) for machines to produce human quality translation. He's also saying that there is a place for both machine and human translation.

Whether or not he's correct to expect the two to be equivalent in the first place is another question. It's kind of a weird way to set this up. But I do think personally that the loss of trained human translators, because we think we have machines that can handle this, is a Bad Thing in the long run.

Hofstadter:

Google Translate offers a service many people value highly: It effects quick-and-dirty conversions of meaningful passages written in language A into not necessarily meaningful strings of words in language B. As long as the text in language B is somewhat comprehensible, many people feel perfectly satisfied with the end product. If they can “get the basic idea” of a passage in a language they don’t know, they’re happy. This isn’t what I personally think the word “translation” means, but to some people it’s a great service, and to them it qualifies as translation.

LanguageLog:

If GT and other machine translators are unable to do a perfect job, or even one that is close to what a skilled human translator is capable of, what are their purposes? I believe that they fulfill a useful function in giving us the gist of the meaning of texts written in languages with which we are unfamiliar.
posted by carter at 5:12 AM on February 21, 2018 [1 favorite]


When I was doing this, I used the google translate app, which lets you point the phone camera at printed text and will read and translate it from there.

Yes I’ve seen that service and it’s ok. It only seems to translate half of the words on the screen. German has many compound words and you need all the parts of the word to get the meaning so only translating half the word doesn’t really help. It’s a cool trick but not useful for every day.
posted by LizBoBiz at 5:29 AM on February 21, 2018


After translating the description of this post from English to Danish to Russian to Hungarian to Esperanto to German and back to English, I think we've arrived at the root of the issue:

Douglas Hofstadder is deeply ashamed of why AI methods do not respond to true sales.
posted by grumpybear69 at 6:55 AM on February 21, 2018 [2 favorites]


In 2050 the AI overlords will be remixing old Google Translate snippets into poetry and snickering at the subtle wordplay which humans cannot understand
posted by RobotVoodooPower at 7:00 AM on February 21, 2018 [1 favorite]


I wish no harm to the lovely Google Translate and think we should all help it to get better as best we can.
posted by Meatbomb at 7:07 AM on February 21, 2018 [3 favorites]


This was a bit of a strange article, in that it felt to me like the whole body of his argument took an arrow to the heart at about the third paragraph, and he didn't even notice. The bit where he notes, parenthetically, that the Supposed to be Good version of google translate was only available in 9 languages when he began writing the piece, and by the time of publication it was available in 97.

It's all very well to decide that a machine can never translate Pushkin. (Nabokov, one of the greatest writers of the 20th century and natively trilingual, had such a tough time with Eugene Onegin his next work centred on a poet plagued by a mad translator.) But translating Pushkin ain't 99% of the translating that gets done. It's like saying that using hand-carved pegs to assemble furniture gives a precision of fit and tightness that a factory-made nail can never match. What does it matter, when the nails come 10-a-penny? Quantity has a quality all its own.

I'm not sure where I'm going with this, except that it all feels a little god-of-the-gaps-y, a failure to truly engage with the problems and possibilities created by AI if all we do is stand before it continually redifining true intelligence, and therefore human worth, to mean things AI cant't do. Because there's always a yet. Eppur si muove.
posted by Diablevert at 7:31 AM on February 21, 2018 [7 favorites]


I do some freelance editing of journal articles in my area of expertise written by non-native speakers. The company I work for also offers translation services for some languages, where the translation is done by a native speaker with excellent English skills who is not a subject matter expert and then copy edited by someone like me.

So, I frequently see:
1) Translation by speaker of both other language and English who is not a subject matter expert
2) Article written in semi-competent English by a non-native speaker who is a subject matter expert
3) Gobbledy goop put together from Google Translate by a non-English speaker who is a subject matter expert but did not want to pay a real translator.

I have gotten quite good at both 1 and 2. 3 continues to be a nightmare. I read Spanish fluently and French adequately, so if the original language was a Romance language I can sometime do decent guessing about Google's errors, but if the original language was Chinese, I am completely lost. At least for technical work, paying a real translator is still worth every penny if you would like your journal article to be readable by any human.
posted by hydropsyche at 7:47 AM on February 21, 2018 [3 favorites]


I doubt Google Translate will be used to translate literary fiction, poetry, or diplomacy

I wouldn't be so sure. You know that raft of machine learning image stylizers, the ones that take any picture and then "re-paint" it in the style of Van Gogh or Picasso or whatever? It's not hard to imagine a machine translation algorithm doing that for text, too. The state of the art isn't there yet, processing images turns out to be easier than text for various interesting reasons. But it's definitely imaginable. Summarizer algorithms sort of already do this, albeit by stripping out all the nuance from language.
posted by Nelson at 8:43 AM on February 21, 2018 [1 favorite]


Everyone knows that google translate on translate.google.com is not the same as their cloud API right? And that the latter is the state of the art, not the former?
posted by MisantropicPainforest at 9:04 AM on February 21, 2018


I currently translate Korean>English, mostly comics. This comes up a lot when I talk to people about translation work where they'll ask me, "Don't you think machines will help you do your job better/do more translating?" The thing is I remember the days of babelfish and even iterations of Google Translate before this one where I will totally say it is MUCH better than it used to be. But I always hesitate when people suggest machine translation as part of the workflow, just yet.

I mainly think where machine translation still falls short is definitely context. It might be better with other language pairs, but at least in Korean>English, while simple sentences or thoughts can be translated pretty well, I think a lot of people still forget that in some capacities (such as literary work, movie subtitles, and even comics), a huge part of the job is part writer/editor and not just translator. For example, there are times I need to evaluate and rewrite because the way text is laid out in speech bubbles doesn't translate 1 to 1 into English, so either I'm moving things up into entirely different speech bubbles or rewriting the text maybe a whole page back to fit it into the order of things, while balancing between keeping the integrity of the original text but also making sure the target audience gets it. I can imagine for subtitles and idiosyncratically written literature would also be the same way.

The thing is whenever I explain this to people they try to make it sound like I'm gatekeeping translating or being curmudgeonly when what I'm simply saying is it's just not there yet. Maybe some day, who knows, I don't know. Like I think people are too quick to dismiss when people try to object to certain things about technology as throwing out the baby with the bathwater. I do think this stuff is useful and a lot better than it used to be and probably getting better, but people I've talked to get really defensive when I say, "These are things that probably still need to be worked on though." To be honest, even when we get to a day where machines can do a decent 1st draft, I'd still want a human involved at some point to proofread or edit it. However good it it is, there's still going to be people involved. Hell, I still work with editors/project managers and/or shoot off questions for the creator about things. And that's not because I'm afraid a machine is going to take my job but because I legitimately care about translation/writing being done well.

I mentioned in another translation thread here that translation isn't just language translation either. A lot of times it involves cultural translation as well. There are things when I translate that I know I have to either footnote or write in a way that explain to an audience that don't know Korean things. Like, things that are historical context (not even ancient but just explaining simple things like something set in the 80s where something a character did as just part of their daily routine is understandable in Korea in the 80s but to a Western audience, where it's not just a different country but a different time they would be confused by that). Just of the top of my head I can think of fairly recent projects where things like character names involved having to go back and forth with the creator about how I understand what their intention and the reasoning behind this was, but there's not English equivalent for it, but here are some suggestions that carry the spirit of the original pun/reference you were trying to make with the name that an English-speaking audience will understand. When I sell myself as a translator I don't just say, "Hey by they way I'm TOPIK level 6," I also make it a point to mention I grew up in Korea, am actually familiar with Korea culturally, but also vice versa with English/American/Western things because trust me, most "mistakes" I see are when it is obviously someone who knows the language but doesn't have as firm of a grasp on one culture, whether it's slang, pop culture or history, etc., vs. the other. It's things like that where I think the machine still will probably have a bit to learn.

But I do think Google Translate or any future translating apps will get to a point where people can confidently travel the world confidently (if not at least 75% there already) compared to machine translating of just a few years ago, and might be on the road to being able to hold light conversations and I think that's cool as hell. I also think applications of it would be cool in sort of being a memory keeper of languages and words that might be lost to time if it's built on learning from databases and would honestly like to see it used for that too.
posted by kkokkodalk at 10:08 AM on February 21, 2018 [7 favorites]


> Everyone knows that google translate on translate.google.com is not the same as their cloud API right?
Are you sure? I think Google Translate also uses RNN, see here

Back to the article, I think Hofstadter is saying that Google Translate is useful but "shallow". We imagine "it" understands the text, because our minds imbue "it" with understanding. In the same way a puppet seems conscious.

He tries to show with some examples that to reliably translate a text in the face of ambiguity the text needs to be understood. The wider open questions that the article points to: can a computer understand a text? what does it mean to understand a text?

Hofstadter's answers to those two questions are still valid, at least as sign posts.
posted by haemanu at 10:23 AM on February 21, 2018 [1 favorite]


Read GEB only if you are amused by another person being amused by how amusing they are. I have never read a more self-congratulatory book.
posted by Fraxas at 10:50 AM on February 21, 2018 [4 favorites]


I get pretty similar results to the ones Hofstadter reports using Google's translate API, so I don't think it's a matter of not using Google's best model. Although the specific example of "female scholars" in the German example is now a bit better:
After the lost war, many German-national professors, now the majority in the faculty, saw it as their duty to protect the universities from the "odd"; most vulnerable were young scientists before their habilitation. And women scientists did not question anyway; there were a few about a few.
(The French-English example with genders is still lost in translation)
posted by sedna17 at 10:50 AM on February 21, 2018


So I've got a good example of how even a human translator can mess up because they don't have all the context they need. And in this example, a computer translation, even one relying on semantics would make the same mistake I think.

My wife and I were reading Mircea Eliades' Portuguese Diaries. The original text is in Romanian and so we were reading an English translation.

The very opening mentions him standing in Lisbon by the River Tagus and watching the many kingfishers fly by. Wait, my wife says, there are kingfishers on the river Tagus?? That makes no sense at all!

The Romanian original says pescăruş. It turns out that pescăruş can mean either 'pescăruş' or can be a shorthand for 'pescăruş de mare'. Essentially, the original says 'fisher bird' which if the setting is a river then it is the kingfisher but if the setting is the ocean then it means seagulls.

So when you as a translator, without much geographical knowledge, see pescăruş and 'the river Tagus' then you choose the river bird - the kingfisher. A contextual machine translation would make the same mistake.

But that's wrong. Because the setting is Lisbon and even though the mouth of the Tagus is large enough there to be reasonably called a Bay, the fact remains that it is still called the River Tagus.
So the birds in question here are not kingfishers, they are seagulls.

PS: the Spanish translation got it right. Also, we shared this anecdote with Romanians we know and they were highly amused.

PPS: Even some bird knowledge would tell you that 'many kingfishers' makes little sense. They are not flocking type birds.
posted by vacapinta at 11:38 AM on February 21, 2018 [4 favorites]


Sure we do. They're called movies.

And quite a few electronic "performances" these days are essentially pre-sequenced spectacle, maybe with some people on stage for show. The planning and choreography is of course done by humans, though, which seems like the core point anyway.
posted by atoxyl at 11:51 AM on February 21, 2018


Fifteenth-ing GEB ... but it is almost half a century ago and like an incredibly far future SF novel there may be some phrasing that not fall on ears as contemporary. I did not get through the genetics chapters but I expect some discussion of CRISPER tech would extend the ideas to the modern world. Mostly it's math in several guises and math does not get old, or rather any math most of us will ever be able to know was locked down standard before the revolution. (the american one) (no really google inter-universal Teichmüller theory ;-)
posted by sammyo at 11:52 AM on February 21, 2018


As there appear to be quite a few people unaware of how good google translate has gotten in last few years, this is the lead story from today's Frankfurter Allgemeine. And here is google's translation. Granting some infelicities, how amazing is that?

Also, I had experienced the same frustration while trying to buy facewash in Germany, and distinguishing shampoo from conditioner. But to be fair to the machines, it's not their fault that many products are labelled gesicht/haupt pflege (face/skin care) and leave the customer to puzzle it out.

Another oddity - google often fails on the verb 'ausfallen' in the sense of 'to cancel'. It thinks this is the fourth most common meaning of the word, but I have never seen it used in any other sense. Eg. the very important message Dieser Zug fällt aus/this train is cancelled.

(as my dear mutter pointed out, you can check in the back for sodium lauryl sulphate. all soaps have it.)
posted by tirutiru at 12:47 PM on February 21, 2018 [2 favorites]


I can imagine for subtitles and idiosyncratically written literature would also be the same way.

Heck, that's even true for English to English "translation" when making closed captioning to some degree. Each caption has a set size limit and must be left on screen for enough time for the viewer to read it while not falling too far behind in matching the dialogue to the image.

Back when I worked in closed captioning we often had to make small adjustments in wordy videos or shows to provide the best combination of elements to maintain a clear sense of the speech and matching the events/speaker. Without that captions run too quickly, overlap, or show up under different events or speakers where they add confusion as to who is saying what and why. preventing that requires knowing the meaning of the dialogue its context within the show.
posted by gusottertrout at 12:54 PM on February 21, 2018 [1 favorite]


a concert where only a recording was played, well, we don't have that for a reason

NO HAY BANDA

there is no band
posted by Sebmojo at 1:31 PM on February 21, 2018


I feel faintly, obscurely, betrayed by this because Douglas has gone from spitting out multiple erudite takedowns of the deeply stupid Chinese Room AI thought experiment, to, basically, waving it like a flag as he shouts THEY'RE NOT REALLY THINKING! THEY'RE NOT REALLY THINKING!
posted by Sebmojo at 1:47 PM on February 21, 2018


I think the article does not make a Chinese Room AI thought experiment, on the contrary, the author does state that he believes machines will eventually perform "deep" translations.

From the quoted German newspaper translation:

> After the authorities reacted with force and let protesters shoot (...)
So the authorities forcefully allowed the protesters to shoot?

I think Hofstadter's point stands: Google Translate has knowledge of zillions of incredibly complex usage patterns and can produce something very useful, but this mistake is a symptom that there's no real context or internal representation of the world that the text describes. In this sense Google Translate is (today) still "shallow."
posted by haemanu at 2:29 PM on February 21, 2018 [2 favorites]


Props for getting the Atlantic to publish the Chinese Room problem as if it were original though.

Nonsense. The Chinese Room Problem has nothing to do with translation and the only role of Chinese in the scenario is to obfuscate the category mistakes Searle is making.
posted by straight at 2:40 PM on February 21, 2018 [3 favorites]


what's most shocking to me about statistical natural language processing, is how a really shitty baseline can get you to like 60-70% human (qualitatively, or quantitatively on constrained tasks where that makes sense), and good but old models can get you in the range of 80-90%. a lot of the structure of language is pretty boring, it seems. I don't think Hofstadter is shifting the goalposts, here, he's just honing in on the aspects of language that are really most interesting and important.

(I don't mean to diss the current Google Translate system, which is the result of many cool new ideas and is really, really, extremely impressive, even if Hofstadter as a translator and modern-ML-curmudgeon doesn't care.)
posted by vogon_poet at 4:57 PM on February 21, 2018 [2 favorites]


Props for getting the Atlantic to publish the Chinese Room problem as if it were original though.

He explicitly distinguishes his position from Searle's

I won’t touch that debate here, but I wouldn’t want to leave readers with the impression that I believe intelligence and understanding to be forever inaccessible to computers. If in this essay I seem to come across sounding that way, it’s because the technology I’ve been discussing makes no attempt to reproduce human intelligence.
posted by atoxyl at 5:10 PM on February 21, 2018 [1 favorite]


If you want some great Hofstadter, after Godel Escher Bach, I recommend the collected work of his column for Scientific American. He took a column which was called "Mathematical Games" and renamed it "Metamagical Themas," an anagram of course.

This discussion we're having would be greatly informed by his book on trying to find cognition in the black box of our brains, I Am A Strange Loop. I cannot believe this book hasn't been mentioned in this thread yet. His thesis seemed (to me) to be that what we think of as cognition is no more or less than a whole lot of self-recursive self-observation, the emergent result of a brain/body that can't help but talk to itself. This is the book that allowed me to stop believing in a transcendental soul, body-mind duality, what he calls "the spark of life," and allowed me to be amazed that there may be no existential line between cognitive and non-cognitive matter, and isn't that super weird and doesn't that have some weird implications.
posted by panhopticon at 7:00 PM on February 21, 2018 [1 favorite]


I also liked "The Mind's I" a sprawling collection of essays about the nature of self.
posted by RobotVoodooPower at 10:08 PM on February 21, 2018 [1 favorite]


gusottertrout you reminded me of something and another anecdote to throw on the "context" pile. Last year I got hired to work with video editors to translate. They already had a team in Korea who had filmed the video and provided the files with all the pertinent info (what it's about, the questions they asked the subject in English, who they are speaking to) and a separate translation team that had provided translated subtitle files. But the reason I got hired was to basically act as the "Korean brain" for the editors based in New York who were not Korean speakers (though from a variety of backgrounds and nationalities).

Basically the reason I was hired was the team in Korea and clients, upon review would have suggestions for edits or changes. This created a problem for the team doing the editing and post-production work since even though they had the subtitles and knew what the interviewed subjects were saying, that didn't mean they knew where to cut and splice the video. And any editing they did also meant needing someone to make sure the subtitle was actually going where the person said something was going. They still wanted the video to make sense to someone who is Korean watching it, but even if someone who isn't a native Korean speaker watched it, they didn't want to create some weird hodgepodge mix of sounds just splicing whoever was speaking on camera. That would totally miss the entire point of filming and interviewing someone for a documentary-style thing.

Even if it was just a scene with a voiceover, you wouldn't want to have a random scene inserted when the person is saying something entirely different but it works because the subtitle says so. So it was a lot of me saying "okay, so where he says 'da' is where you want to cut because that's the end of that sentence. No you can't really cut it there because in Korean you're basically cutting him off and it sounds like he just got interrupted... this subtitle fits here, but you'll have to rewrite it because when spliced with the next scene it doesn't sound like a real person sentence."
posted by kkokkodalk at 9:19 AM on February 22, 2018 [4 favorites]


claudius: does it seem like that was the result of intelligence, or statistics?

I feel like this is the critical distinction to make with the current crop of "AI" -- it's not really applying its own intelligence to the problem, rather it's doing very sophisticated statistical inference on someone else's intelligence. ie.
- Image recognition isn't "this is a bird" it's "people labeled images with similar features as a bird"
- Translation isn't "'Hola' means 'hello'" it's "people often translate 'Hola' as 'hello'"

This can produce really powerful results, but the intelligence is really all in the corpus fed to the learning algorithm. All the AI is doing is trying to find the patterns in the corpus that it can match up with a given input.

Medical diagnosis might be a good example. You might conclude that you have the flu because you have fever, chills, and body aches. That doesn't mean you understand anything about the influenza virus or its effect on the human immune system. But because lots of people who do understand this complex system have provided you with this relatively simple formula, you can say you "know" this. You've leveraged their expertise to establish a shortcut, which allows you to arrive at an answer without really knowing how it works. I think this is what is meant by "shallow" here.

This isn't to pooh-pooh either the usefulness of current AI or its ability to someday understand things on a deeper level -- this sort of pattern matching seems like a major element of our own intelligence -- but I think this is an important feature to consider, especially when considering failure modes. Because this sort of AI will fail in ways that seem bizarre and inexplicable (for instance being unable to recognize a stop sign because there's a strip of tape on it) if you imagine that it is "solving" the problem the way a human would. And I expect these sorts of failures to be a major impediment to acceptance of things like self-driving cars, because people will (rightly) never trust a system if they don't have some sense of how and why it might fail.
posted by bjrubble at 11:33 AM on February 22, 2018 [2 favorites]


>From the quoted German newspaper translation:

>> After the authorities reacted with force and let protesters shoot (...)
>So the authorities forcefully allowed the protesters to shoot?


The funny thing about that is the little idiom being translated ("auf Demonstranten hatte schießen lassen" - "allowed the demonstrators to be fired upon") is really common and just the type of thing you'd expect an automated translation to nail with no problem.

It does make one suspect the translation engine is relying solely on statistical analysis of how words are typically used rather than any type of grammar type analysis to assist that.

"Lassen" is one of those little helper words that can mean 5 dozen different things depending on context. But once you have parsed the grammar of the sentence and realize the type of little helper verb it is on this case, it can mean on thing & one only.
posted by flug at 7:09 PM on February 23, 2018 [1 favorite]


that's....not how this works

you mean, jokes?
posted by Sebmojo at 12:46 AM on February 28, 2018


this sort of AI will fail in ways that seem bizarre and inexplicable (for instance being unable to recognize a stop sign because there's a strip of tape on it) if you imagine that it is "solving" the problem the way a human would.

"And it has a smooth brain, which means it hasn't evolved the thinky-thinky parts. For example: if you pick eucalyptus leaves, which it eats, off the branch and put them on a plate, the koala doesn't know what to do with them. Not a genius animal. However, this lack of brain gives the koala a discrete evolutionary advantage, in that it does not give a fuck. Case in point: koala in the rain - no fucks given."

Robot cars may well have better reaction times than human beings, and they may well be less distractible. But I am still quite hesitant to put my life in the hands of software that does not yet give a fuck.
posted by flabdablet at 1:49 AM on February 28, 2018 [1 favorite]


you mean, jokes?

nope!
posted by thelonius at 2:12 AM on February 28, 2018


« Older Mon Dieu, it's full of...   |   Exhibition of Memories Newer »


This thread has been archived and is closed to new comments