Supercomputer fools Kryton from Red Dwarf
June 8, 2014 10:45 AM   Subscribe

A supercomputer has fooled judges a third of the time that it is a 13 year old Russian schoolboy named Eugene Goostman.
posted by 0bvious (65 comments total) 13 users marked this as a favorite
 
This is a fascinating story, but I would love to see transcripts of the conversations where the judges were fooled.
posted by Night_owl at 11:00 AM on June 8, 2014 [8 favorites]


The internet is abuzz with this. Positively abuzz!

I know little about A.I. research but it feels like cheating, somehow, to make the computer a child. Wouldn't a tester who thinks he is talking to a child likely be more forgiving of the sort of awkwardness or unnaturalness or tendency toward non sequitur that might cause a tester who's talking to an adult to conclude that it's a computer impostor on the other end?
posted by eugenen at 11:05 AM on June 8, 2014 [9 favorites]


Also making it a child who does not speak the language natively (and so will be slightly awkward with word choice or phrasing) could be seen as cheating as well.
posted by idiopath at 11:07 AM on June 8, 2014 [10 favorites]


When I see a claim in a Gizmodo headline, I automatically respond "No it hasn't", and I'm right better than the one out of three record that computer has.
posted by benito.strauss at 11:11 AM on June 8, 2014 [25 favorites]


It's the nature of such things, I guess. They're aiming to win the competition rather than produce a pure artificial intelligence. Hopefully though, what they do learn will help.
posted by Auz at 11:12 AM on June 8, 2014 [2 favorites]


What this probably means is that a real 13-year-old only has about a one in three chance of passing the Turing Test.

If you've ever spoken to one, you're probably not terribly surprised to hear that.
posted by mhoye at 11:16 AM on June 8, 2014 [42 favorites]


Wasn't the Turing Test passed for the first time 25 years ago, by using "profanity, relentless aggression, prurient queries about the user, and implying that they were a liar when they responsed."
posted by effbot at 11:27 AM on June 8, 2014 [16 favorites]


Alan Turing's original paper makes no mention of Russian schoolboys
posted by 0bvious at 11:30 AM on June 8, 2014 [4 favorites]


It's interesting, but such a constrained version of the Turing test that it really doesn't say much about the state of AI: imitating a non-native child for 5 minutes. Five minutes isn't really very much time to get out of the "cleverbot" space, where a machine can just respond with stock answers to the finite number of questions interrogators are likely to ask.

It appears that Eugene is online, but the site is down at the moment.
posted by justkevin at 11:33 AM on June 8, 2014 [1 favorite]


95% of technical support representatives can't pass the Turing test.
posted by delfin at 11:40 AM on June 8, 2014 [10 favorites]


Came in to say what effbot said, the chat bots that fool people the most often lapse into obscenities when faced with unfamiliar stimuli. Because that's what humans on the internet are like!
posted by Eyebrows McGee at 11:40 AM on June 8, 2014 [2 favorites]


Without a transcript or the means to converse with the bot then this seems like a very dubious claim. Very. I mean super very. As in "the Turing Test has not, in fact, been passed." "At all."
posted by bfootdav at 11:49 AM on June 8, 2014 [1 favorite]


I spoke to Eugene earlier on today, and - ignoring the fact I already knew it was an AI - it was an entirely unconvincing conversation. Like Night_owl, I'd like to see some transcripts.
posted by alby at 12:01 PM on June 8, 2014


95%? Weird, cuz probably 9/10 times I get the help I need. What is your profession?
posted by Brocktoon at 12:11 PM on June 8, 2014


FYI, the Eugene chatbot on that PrincetonAI site is a version from 2001, not the one that allegedly defeated the test today.
posted by Rhaomi at 12:13 PM on June 8, 2014


Eyebrows McGee: "Came in to say what effbot said, the chat bots that fool people the most often lapse into obscenities when faced with unfamiliar stimuli. Because that's what humans on the internet are like!"

"My mother? Let me tell you about my mother."
posted by symbioid at 12:16 PM on June 8, 2014 [4 favorites]


Conversation snippets from a previous Turing test.

I am not sure how these bots work, but the judge in the article above asked some Sci-Fi related questions that came back with pretty plausible answers. Do the bots crawl the web for info on a topic/keyword? Are they programmed to respond to certain instances of conversation with extra conversational tidbits? Because I'm thinking if the programmers were programming the bots knowing other programmers would be the judges, were they perhaps trying to skew the test in advance?

I agree that it would be easier to make a computer seem human when the human is supposed to be a 13 year old kid from outside the US. It would probably limit the topics that the judges felt they could realistically inquire about and therefore stuck only to really roundabout things like favorite food, movies, etc. What if they asked about Russian history, for instance? Were they allowed to "test" the other person/bot or was it supposed to be strictly conversational?

So many questions....
posted by sevenofspades at 12:33 PM on June 8, 2014


Cleverly rigged. A 7-year-old would be harder still, and possibly a parrot with a stenographer.

Still only 1/3 were fooled. Parents of teens were probably not among those judges. Were any of them 13-year-olds? They would know what other 13-year-olds from -anywhere- would be expected to know.
posted by Twang at 12:53 PM on June 8, 2014


95%? Weird, cuz probably 9/10 times I get the help I need. What is your profession?

There is a big difference between "I got the help I needed from a tech support/customer service rep" and "the tech support/customer service rep was indistinguishable from a machine." The odds are strong that even when the former condition is true, the latter will be too. Twice as likely if you're calling Comcast.
posted by delfin at 12:58 PM on June 8, 2014 [1 favorite]


"However this event involved the most simultaneous comparison tests than ever before..."

Inserting grammatical errors at random is a cheap trick that will never fool more than about 20% of people, Kevin Warwick 2.3beta!
posted by gurple at 1:11 PM on June 8, 2014 [3 favorites]


Hey, nobody said Eugene got an A+ on the Turing Test, only that he passed.
posted by Pirate-Bartender-Zombie-Monkey at 1:14 PM on June 8, 2014 [2 favorites]


Aah, the famously publicity-shy Kevin "Captain Cyborg" Warwick is involved. I'm suddenly less impressed.
posted by BinaryApe at 1:48 PM on June 8, 2014 [8 favorites]


I don't remember Turing saying anything about 30% being good enough.
posted by Segundus at 2:04 PM on June 8, 2014 [3 favorites]


Segundus: "I don't remember Turing saying anything about 30% being good enough."

I blame Common Core for our falling Turing Test standards.
posted by Eyebrows McGee at 2:08 PM on June 8, 2014 [13 favorites]


These articles are so frustratingly light on any sort of useful information.
posted by jeffamaphone at 2:24 PM on June 8, 2014 [1 favorite]


I don't remember Turing saying anything about 30% being good enough.

Didn't Turing really specify that a computer stand in for a man trying to be identified as a woman, and that success would be determined by passing or failing to pass as a woman at around the same rate as a human man control group? (Forgetting this feels like a bit of widespread cultural whitewashing, no?)
posted by nobody at 2:52 PM on June 8, 2014 [4 favorites]


They're aiming to win the competition rather than produce a pure artificial intelligence.

We don't have a clue what real intelligence is, how would we know if we made an artificial one?
posted by empath at 3:21 PM on June 8, 2014 [2 favorites]


I wonder what Salinger might've said.
posted by BrotherCaine at 4:16 PM on June 8, 2014


the chat bots that fool people the most often lapse into obscenities when faced with unfamiliar stimuli

PARITY ERR ... 23095
PARITY ERR ... 39084
PARITY ERR ... 20094
PARITY ERR ... 09232
PARITY ERR ... 10293
PARITY ERR ... 03123
posted by RonButNotStupid at 4:31 PM on June 8, 2014


I don't remember Turing saying anything about 30% being good enough.

I didn't remember that either, but it turns out that he did say that:

"I believe that in about fifty years time it will be possible to programme computers with a storage capacity of about 10^9 to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning."

Forgetting this feels like a bit of widespread cultural whitewashing, no?

Not really; while A and B are described as a man and a woman in his "imitation game" scenario, their genders aren't any more relevant for the actual experiment than the genders of Alice, Bob, and Eve.
posted by effbot at 4:53 PM on June 8, 2014


Do we know what percentage of the population pass the Turing test?
posted by double block and bleed at 4:54 PM on June 8, 2014 [1 favorite]


good point, what % of people pass for bots anyway? I don't buy that some simple generic exchange of texts is "passing" the turing test, considering that a bot made up of a markov chain generator linked to memebase.com would probably pass for your average forum commentator.

What chatbot can: discuss opera? Yap about last week's game? Argue politics? Make someone fall in love with it? That's passing for human...
posted by mrbigmuscles at 5:06 PM on June 8, 2014


Note that Turing dismisses the question "can machines think" as absurd, in his paper. He replaced the question with one based on the imitation game.
posted by yaxu at 5:10 PM on June 8, 2014 [2 favorites]


Didn't Turing really specify that a computer stand in for a man trying to be identified as a woman, and that success would be determined by passing or failing to pass as a woman at around the same rate as a human man control group?

Not quite. If I remember correctly, he introduces the concept of the imitation game to the reader by describing it as one in which a man attempts to fool a human into believing that he is a woman. Only then does he introduce the computer-versus-human variant, which is a separate but analogous game, the object being for the computer to attempt to fool a human into believing that it is human. It's the second variant that is now known as the Turing Test, and nothing in its rules involves gender.

The whole gender thing is pretty weird by modern standards for sure, but it's not directly relevant to the original formulation of the test per se.
posted by my favorite orange at 5:33 PM on June 8, 2014 [3 favorites]


My Playstation 3 in sleep mode passes for a dead guy 100% of the time.
posted by jimmythefish at 7:10 PM on June 8, 2014 [1 favorite]


Wasn't the Turing Test passed for the first time 25 years ago, by using "profanity, relentless aggression, prurient queries about the user, and implying that they were a liar when they responsed."


Yeah, that's pretty much a conversation with a 13 year old boy.
posted by Muddler at 7:26 PM on June 8, 2014


Wasn't the Turing Test passed for the first time 25 years ago, by using "profanity, relentless aggression, prurient queries about the user, and implying that they were a liar when they responded."

On /b/, no one knows you're an AI.
posted by sebastienbailard at 7:56 PM on June 8, 2014 [4 favorites]


All you do is make it talk incessantly about the latest minecraft mods while ignoring all conversational cues that the other "participant" is bored and would rather talk about anything, for god's sake anything, else. Where's the trick in that?
posted by drnick at 8:17 PM on June 8, 2014 [2 favorites]


"Eugene was 'born' in 2001. Our main idea was that he can claim that he knows anything, but his age also makes it perfectly reasonable that he doesn't know everything. We spent a lot of time developing a character with a believable personality. This year we improved the 'dialog controller' which makes the conversation far more human-like when compared to programs that just answer questions. Going forward we plan to make Eugene smarter and continue working on improving what we refer to as 'conversation logic'."

You know, when I learned about the Turing test in school, I didn't imagine that the bulk of the development time would be spent imbuing the bot with one particular character. I guess I imagined an intelligence with the "free will" to pick its own personality. To me, that is a huge part of what it means to be human. The freedom to pick your own interests and passions, and become the kind of person you want to be.
posted by mantecol at 8:26 PM on June 8, 2014


Robert Llewellyn's best move would've been to talk to the alleged child about whether or not he believed in silicon heaven (Vimeo). Wondering after the fate of all the calculators would be a dead giveaway. (Llewellyn is the one with the head shaped like a novelty condom.)
posted by Sunburnt at 9:36 PM on June 8, 2014


To me, that is a huge part of what it means to be human. The freedom to pick your own interests and passions, and become the kind of person you want to be.

You must have a radically different experience as a human than me if this is all stuff that you can choose (help, I'm trapped on MetaFilter).
posted by ODiV at 10:33 PM on June 8, 2014 [3 favorites]


Did you ever take that test yourself, Mr Deckard?
posted by Sebmojo at 10:39 PM on June 8, 2014 [1 favorite]


The Turing Test always struck me as about as unscientific a test as I could imagine.
posted by Decani at 1:12 AM on June 9, 2014 [2 favorites]


Traveling in Peru, I was able to (unintentionally) convince for about five minutes a much more diverse group of judges 2/3 of the time that I was fluent in Spanish and obviously from Europe or someplace besides the United States (I am neither).
posted by straight at 1:59 AM on June 9, 2014


In one respect you could say success proves Turing wrong. No chatbot has any pretension to thinking in the human sense, so if one can pass, it shows that the test isn't really a good substitute for the original question 'can machines think?' after all.
posted by Segundus at 4:19 AM on June 9, 2014


Turing never claimed that the imitation game might one day prove machines can think. Rather, his argument was that it is impossible to know whether a thing (human, machine or otherwise) can think just from its behaviour.
posted by 0bvious at 4:54 AM on June 9, 2014


The Turing Test is one of the most useless concepts in artificial intelligence. It's not scientific: it is neither objective nor repeatable, and proves no hypothesis. But it's big in popular concepts of AI, so people put effort into chatbots that are not useful agents for solving any real-world problem. Some of the effort expended on this may help expend the field of natural language processing (NLP), but in the big picture these programs are useless toys.
posted by graymouser at 6:06 AM on June 9, 2014 [1 favorite]


This seems like a grandiose claim attached to a relatively cheap accomplishment. I don't mean to disparage the accomplishment itself, only in relation to the claim attached to it: Saying that it has passed the Turing Test seems like a step above saying that a computer that responds to queries with things like "NO SPEAK ENGELSH" would be indistinguishable from a human.
posted by Flunkie at 6:07 AM on June 9, 2014 [1 favorite]


¿Qué?
posted by mikelieman at 6:44 AM on June 9, 2014


The problem there is that you've implied to a human interlocutor who recognizes Spanish that you speak Spanish. So now your program had better be able to pass the Turing Test when faced with an interlocutor who actually does speak Spanish.

Better to just be vague. You could have the program able to recognize various languages that are popularly recognizable by a large portion of people familiar with the Latin alphabet, and recognize some wrongly, and not recognize others at all. So maybe it responds to Spanish with "NO HABAL SPANOL", and maybe to French with "NO PERLE FRANCIS"... but maybe it responds to Italian with "NO HABAL SPANOL" too, and it recognizes German and Dutch and maybe even sometimes various Scandinavian languages all as German. Maybe it responds to Croatian with just "NO SPEAK, NO SPEAK". And to 你識唔識講廣東話呀 it responds ¯\_(ツ)_/¯
posted by Flunkie at 7:35 AM on June 9, 2014


Would this be a good place to link to this (vimeo)? It doesn't actually have Kryten in it, but it is one of my favourite Red Dwarf episodes.
posted by frogfather at 7:38 AM on June 9, 2014


Hmmmm... and it had better know basic arithmetic:
Human: 8 + 5 = ?

Program: 13
But it better not be willing to go overboard on it. Instead of:
Human: 354135351 * 3425784378953 = ?

Program: 1213191353490837667503
It should probably do something like:
Human: 354135351 * 3425784378953 = ?

Program: :D
posted by Flunkie at 7:41 AM on June 9, 2014 [1 favorite]


The Guardian has some extracts from the Eugene conversation transcripts.
posted by 0bvious at 7:53 AM on June 9, 2014 [1 favorite]


The Guardian has some extracts from the Eugene conversation transcripts.

Which one is the computer?
posted by effbot at 8:28 AM on June 9, 2014 [1 favorite]


Program: :D

The one in the actual paper is pretty good:

Q: Add 34957 to 70764.
A: (Pause about 30 seconds and then give as answer) 105621.
posted by effbot at 8:34 AM on June 9, 2014


Metafilter: makes no mention of Russian schoolboys.
posted by Chrysostom at 9:06 AM on June 9, 2014 [1 favorite]


Argument: No Turing Test passed. ( Summary of counterpoint quoted below:)

--begin quote----
It's not a "supercomputer," it's a chatbot. It's a script made to mimic human conversation. There is no intelligence, artificial or not involved. It's just a chatbot.

Plenty of other chatbots have similarly claimed to have "passed" the Turing test in the past (often with higher ratings). Here's a story from three years ago about another bot, Cleverbot, "passing" the Turing Test by convincing 59% of judges it was human (much higher than the 33% Eugene Goostman) claims.

It "beat" the Turing test here by "gaming" the rules -- by telling people the computer was a 13-year-old boy from Ukraine in order to mentally explain away odd responses.

The "rules" of the Turing test always seem to change. Hell, Turing's original test was quite different anyway.

As Chris Dixon points out, you don't get to run a single test with judges that you picked and declare you accomplished something. That's just not how it's done. If someone claimed to have created nuclear fusion or cured cancer, you'd wait for some peer review and repeat tests under other circumstances before buying it, right?

The whole concept of the Turing Test itself is kind of a joke. While it's fun to think about, creating a chatbot that can fool humans is not really the same thing as creating artificial intelligence. Many in the AI world look on the Turing Test as a needless distraction.

--end quote----
posted by blackfly at 12:21 PM on June 9, 2014 [2 favorites]


The whole concept of the Turing Test itself is kind of a joke.

I don't think so. I think it's just misunderstood. It's not literally a pass/fail test. It's a thought experiment about what criteria you would use to decide whether a computer was "thinking" in a sense that is equivalent to what we mean when we say human beings "think".

Instead of trying to define "thinking," Turing said that if a computer could converse with people in a way that was indistinguishable from conversing with a human, he'd be inclined to say the computer was thinking. Basically it's a statement that "I can't define thinking, but I'd know it when I saw it."
posted by straight at 1:25 PM on June 9, 2014


Many in the AI world look on the Turing Test as a needless distraction.

This deserves some explanation. The Turing Test came about before Artificial Intelligence existed as a field. It assumed that the goal of AI was to act humanly, that is, to be indistinguishable from a human being. This was not yet in the capabilities of the computers that they had in the 1950s and 1960s, so the emphasis of research changed. Influenced by Behaviorist psychology that seemed to hold that human thought was relatively easy to map, the AI researchers shifted their goal to thinking humanly.

As Behaviorism fell out of favor, the goal of thinking humanly became hopelessly muddled. It wasn't even clear what AI, at the time an entirely theoretical field, was trying to do. So the goal shifted to thinking rationally: creating systems that could simulate perfectly rational thought according to formal logic. This turns out to have complexity problems, which caused it to be rejected for the goal of all non-theoretical AI: acting rationally.

The rational agent approach cut out the concept of actual thinking in favor of creating decision-making systems. Real AI doesn't "think." It takes a state and a set of rules and uses them to make a decision, such as what move to make on a chessboard. The more complex the state and rules, the more difficult it is to make a decision.

The problem is, the Turing Test is stuck all the way back in "acting humanly" while AI researchers are trying to hone agents to act rationally. But public perception is that the Turing Test is a hurdle that AI should be trying to surpass (or even a proof of so-called "Strong AI," a science fiction term for thinking humanly or rationally), so people who are writing chatbots (things that, from a rational agent perspective, are glorified toys) get more attention than people trying to solve real AI problems.
posted by graymouser at 1:41 PM on June 9, 2014 [1 favorite]


The problem is, the Turing Test is stuck all the way back in "acting humanly" while AI researchers are trying to hone agents to act rationally. But public perception is that the Turing Test is a hurdle that AI should be trying to surpass (or even a proof of so-called "Strong AI," a science fiction term for thinking humanly or rationally), so people who are writing chatbots (things that, from a rational agent perspective, are glorified toys) get more attention than people trying to solve real AI problems.

I think this is right, but it's more than a pragmatic approach that intrigues people and stirs the imagination. There is an ongoing and prevalent interest in thinking about AI not only in how it acts in helpfully rational ways, but on some level mimics the human brain/consciousness. Beyond mimicry, there is the question of whether or not AI can also be self-aware in a way that is similar to human self-awareness. Mimicry and self-awareness are tied closely together in people's imagination, and often closely connected to the question of computational speed in computers as compared to the human brains. If we can get sophisticated enough with our algorithms to attain near-perfect mimicry, might we find the bridge that leads us to sentience?

So, there are a cluster of interrelated issues surrounding computer speed, mimicry of human behavior, and also self-awareness that are often bundled together in questions about what it is to be a person, at least in the public interest. It's this curiosity that keeps people interested in the Turing Test. Personally, I do not think there is a direct or necessary connection between mimicry, speed and complexity of rational thought, and an elusive third-person description of self-awareness (and by extension, notions of person-hood), but it's this very question that continues to make AI the focus of so much speculation and SF, even as we are getting increasing better at making machines that just do things helpfully with a high degree of reliability.
posted by SpacemanStix at 4:30 PM on June 9, 2014


Are they expressing themselves more like us or are we just communicating more like them?
posted by duncan42 at 8:51 PM on June 9, 2014


It seems like computer intelligence has stayed the same but our chats have gotten dumber.
posted by Skwirl at 11:44 AM on June 10, 2014


Many in the AI world look on the Turing Test as a needless distraction.

That's because it was never meant to be some sort of standard of success in AI development. It was merely a way of considering the question, "How would I know if a computer were thinking like a human does?"
posted by straight at 11:50 AM on June 10, 2014






« Older Why We Don't Have Flying Cars   |   War fatigue Newer »


This thread has been archived and is closed to new comments