Speaking Machines: the history of synthesizing speech
March 12, 2016 11:59 AM   Subscribe

Replicating the sound of human voices with any sort of control goes back to Vox Humana, one of the oldest organ stops, dating back at least as far as the late 1500's. Then there's a dip into the uncanny valley with Joseph Faber's Euphonia, before we got to Perfect Paul and beyond, to building custom voices. BBC's Radio 4 had a half-hour special on this topic, titled Klatt's Last Tapes - History of Speech Synthesis, and you can read more, plus a transcript, here.

After the Vox Humana, organ stop that very roughly mimicked the sound of a human voice, the next progression in a synthetic human voice came from trying to re-create how people spoke, with Wolfgang von Kempelen’s speaking machine that was started 1769. Von Kempelen is probably better known for his hoax, The (Mechanical) Turk that appeared to play chess on its own, but his speaking machine was very real. The '(relatively) simple but ingenious' design (PDF) was built as if it were re-creating key portions of a human head. In fact, the one key element that was said to be missing was a more realistic tongue. He wrote a book on his efforts (PDF, review of the German text in English), which lead to a number of people studying his work, including Sir Charles Wheatstone (Archive.org book preview), and at least one person who would improve upon his design.

That was a fellow German scholar, Joseph Faber. In dealing with an illness (imagined or otherwise), Faber found the Kempeler's book an inspiration, and created, destroyed, and re-created variations on the machine at least three times in the mid-1800s, with later iterations including the "stoney-eyed" mask of a female face with lips made of India rubber. Various news reports captured bits and pieces of the operations of the machine, including the 16 or 17 keys that lead to sounds which could be combined to imitate speech well enough that Phineas T. Barnum, looking for a fresh novelty, named the speaking automaton "Euphonia" and took it on a tour, which indirectly lead Melville Bell and his son Alexander Graham to work on speech synthesis (PDF).

Bell was interested in "visible speech," which lead to the telephone, which in turn lead to Bell Labs, where the voder (short video clip; longer audio demonstration with still images) was developed in the 1930s, turning mechanical mimicry into electrical synthesis. The next major advances were made in the 1950s (.AU audio samples), with Walter Lawrence's Parametric Artificial Talker ("PAT") and Gunnar Fant's Orator Verbis Electris ("OVE I") having a few "conversations" at public conferences (no audio, only descriptions). Bell Labs continued to make progress, and there were others (.AU audio samples), and by the late 1980s, "the automatic conversion of English text to synthetic speech is presently being performed, remarkably well, by a number of laboratory systems and commercial devices" (PDF).

For a decade or more, the main "voices" you heard from computers came from Dennis Klatt's DECTalk devices, which came with a number of presets, including Perfect Paul, as heard in weather forecast devices and Stephen Hawking's old computer setup.

But even with a tech upgrade, Hawking decided to keep his old voice, even as new technology provides more options. In fact, it is estimated that more than half of people who use synthetic voices use that same one, developed by Dennis Klatt in the 1980s, because it works so well. One new front on voice synthesis comes from sampling voices - your own or "healthy donors" who would sound similar to you. We've come a long way from the talking machines of Wolfgang von Kempelen and Joseph Faber.
posted by filthy light thief (14 comments total) 41 users marked this as a favorite
This is.... amazing. Thank you.
posted by jokeefe at 12:04 PM on March 12, 2016 [1 favorite]

Just look at the Euphonia, just LOOK at it. Imagine that thing wheeling out to speak to you as a post-dinner-party amusement.
posted by blnkfrnk at 12:40 PM on March 12, 2016 [3 favorites]

I am geeking-out all over this! That's how you make a high-quality post, kids. Well done, flt.
posted by Thorzdad at 1:15 PM on March 12, 2016

Yes, excellent. Looking forward to digging in.
posted by benito.strauss at 1:34 PM on March 12, 2016

I'm so glad people who need voices get voices, too. It's wonderful to hear.
posted by blnkfrnk at 1:44 PM on March 12, 2016

*low whistle*

I'm also looking forward to digging into these links. Very nice post!

Relevant to my interests as we have stuff around the house that talks, owing to my husband being blind. I've become able to understand Voiceover on iOS devices when the voice is cranked to warp speed. Other people are like "WHAT did your TV just say?"

In related synthetic voice news, Mauril Bélanger, a Canadian Member of Parliament who has ALS and can no longer speak, presided as the Speaker of the House of Commons this week using an iPad to direct proceedings.
posted by mandolin conspiracy at 1:51 PM on March 12, 2016 [2 favorites]

Back in 1988 or 1989 when i was working a tech support job (with mostly all young guys) at a place where we had and supported multiple operating systems and hardware. One of my cow-orkers was hyper-competitive and an Amiga freak who said one day that Macs couldn’t do speech synthesis. Having used Macs since their release in 1984 i knew he was wrong, and told him so. He doubled-down on his assertion in his usual assholish way so i didn’t bother to prove it by telling him about Apple’s MacinTalk extension, which was an optional install available since the Mac’s release.

A week or so later i installed an early version of the “Talking Moose” extension on one of our Macs in the front office (which wasn’t assigned to any particular user). Talking Moose ran in the background (yes, even on the earliest Macs that couldn’t run more than one normal app at a time) and would from time to time pop up a animated moose that would utter a randomly-chosen sentence using Apple’s MacinTalk speech synthesis. I set the configuration to not show the animated pop-up at all, the duration between events to five or ten minutes (long enough to be an eternity for someone to waiting without certainty of what was going on, short enough that i knew that he would be sure to hear it), and volume down to one (the step above silent on those Macs, but audible).

I clued my other cow-orkers on to what i had done. In the afternoon an older hardware technician whispered to me that our clueless pal had come back to his repair bench to ask “Do Macs have speech synthesis???” To which he replied earnestly “No, they don’t.”

By the end of the day he was visibly disturbed, jumpy. We let him on to the joke after work. He was so relieved — he had honestly believed he was either going crazy or hearing demons. This may sound cruel, but if you’d known the guy i assure you it was totally worth it.
posted by D.C. at 1:52 PM on March 12, 2016 [4 favorites]

Very nice research behind that post. Thank you.
posted by L E M M at 4:39 PM on March 12, 2016


Synthetic speech, previously: The name of this post is Talking Heads, which referenced Kempelen's invention as "a talking pair of bellows."
posted by filthy light thief at 5:51 PM on March 12, 2016

Oddly enough, just last week I built a tiny DECTalk box using an Emic 2 and a USB serial adapter. Sadly, the Epson DECTalk chip it's based on uses DECTalk 5, which is partly incompatible with the more widely-used DECTalk 4, so it doesn't work well with these songs.

(yes, there are lovingly hand-coded DECTalk versions of Lady Gaga songs in there.)
posted by scruss at 7:52 PM on March 12, 2016 [2 favorites]

Speaking of DIY kits, You Can Now Use Stephen Hawking’s Speech Software for Free (Wired, 08.18.15)

Bonus link: The Secret History of the Vocoder (YouTube; Vimeo), looking at the military history and more on the musical uses by Laurie Anderson, Kraftwerk, Afrika Bambata and Cosmo D.
posted by filthy light thief at 8:20 AM on March 13, 2016

I find that my reading comprehension is significantly greater if I listen to a text and then read it. I use the built in text-to-speech synth app in Mac OS extensively for this purpose. Sadly I don't think the capability of this program has improved in the 10 plus years I've been using it.
posted by The Correspondent on the Continent at 8:37 AM on March 13, 2016

flt, that's pretty nifty as an accessibility toolkit, but doesn't include a DECTalk/MITTalk synth.
posted by scruss at 10:28 AM on March 13, 2016

Let me take the other glove off, as the vox humana swells...

The classic Gary Numan lead synth sound of the Pleasure Principle/Telekon era was the Polymoog Vox Humana patch, the recreation of which (because who can afford o vintage Polymoog, right?) has become something of an obsession among all right-thinking early 80's synth-heads.
posted by Devonian at 12:22 PM on March 13, 2016 [1 favorite]

« Older RIP Moonie, a.k.a. Bruiser Woods   |   Another one bites the dust Newer »

This thread has been archived and is closed to new comments