Skip

I have no idea what perceptual insight is, but this is pretty interesting
November 16, 2008 9:29 AM   Subscribe

An Introduction to Sine-Wave Speech Play the first sound and you'll probably hear nothing but squeaks and bleeps. Play the second one and then go back to the first. Cool!
posted by TheDonF (63 comments total) 69 users marked this as a favorite

 
That's awesome...
posted by TravellingDen at 9:42 AM on November 16, 2008 [1 favorite]


Nifty examples of the theory. Anyone that has has done any foreign travel or had to cut through unfamiliar accents knows this phenomenon. We'll also watch the face of a person and gestures to help decode unclear speech. Try randomly raging at someone "I LOVE YOU" with fists in the air and you'll note the confusion - our brains try to decode the full message in total context, and discordance confuses us. Written communications, such as-email and text messages, lack a detailed web of visual context, which is one reason, I think, it becomes so easy to unintentionally flame others on the Internet.

Now, if someone could come up with a website to train me in the various Indian accents, I might be able to understand tech support.
posted by Muddler at 9:42 AM on November 16, 2008 [1 favorite]


OK, that's just eerie!
posted by SPrintF at 9:43 AM on November 16, 2008


I heard "Islam is the light"
posted by DU at 9:45 AM on November 16, 2008 [4 favorites]


I would be curious how much of what I (we?) hear in the scrambled audio clip has been influenced by just previously listening to the normal audio sample. It seems to work perfectly if you listen to the normal sample moments before the other one, but if you wait an hour, a day, etc. would it still work?
posted by andoatnp at 9:46 AM on November 16, 2008


Got half the message first time through... all of it on the second, without the hint. And I got the picture too.
posted by fearfulsymmetry at 9:46 AM on November 16, 2008 [1 favorite]


Awesome. Also reminds me of "I am sitting in a room" from this thread.
posted by iamkimiam at 9:54 AM on November 16, 2008 [1 favorite]


If you liked that, you might like this. Stimuli used in a psycholinguistics experiment, but again, compare your perception before and after listening to the original. Not by me, but by Franck Ramus.
posted by fcummins at 9:59 AM on November 16, 2008


I got half of it the first time around, and after hearing the original I was able to pick out consonants that I never picked up on the first time.

What's really weird is that after hearing and understanding that one original, I can understand all the other sine-wave sentences, without having heard their originals.
posted by dunkadunc at 10:04 AM on November 16, 2008


Whoa. Pretty neat stuff.
posted by brundlefly at 10:06 AM on November 16, 2008


Having heard several examples of sine-wave speech, your perceptual system has tuned into this form of distortion, so as to be able to perceive new sine-wave speech sentences more clearly.

Ahh, there you go.

Got the picture first time around, though.
posted by dunkadunc at 10:07 AM on November 16, 2008


This is absolutely fascinating. It has got me thinking that this example of the perception of degraded speech also explains how parents learn to understand the mis-shapen words of their toddlers.

I find myself, after receiving a "training" example (e.g., my wife saying, "Anika learned how to say 'MetaFilter' and "mathowie'"), I am able to immediately recognize and understand what she means when she says "Meffimerr" or "Maowie".

I never realized that it was such a well-described function of the human brain and senses. Once again, the child is the teacher of the parent.

Excellent post.
posted by scblackman at 10:10 AM on November 16, 2008 [3 favorites]


Hah, that was cooler than I thought it would be.
posted by chicken nuglet at 10:10 AM on November 16, 2008 [1 favorite]


When I played the first sound, I heard one of the Mac Os voices (deraged maybe?) saying "It was a sunny day and I was sure we were going to the park." Not perfect by close.

Yeah, on playing the other sounds, I can definitely make out the sentences through the distortion. What's the big mystery?
posted by Brandon Blatcher at 10:10 AM on November 16, 2008


This is so cool. I had always wondered exactly what the imperial probe droid on Hoth was transmitting. Now it all makes sense:

'The camel was kept in a cage at the zoo,
The camel was kept in a cage at the zoo.'
posted by isopraxis at 10:12 AM on November 16, 2008 [1 favorite]


I listened to the first sound clip repeatedly, without listening to the second "clear" version, and found that it became increasingly intelligible without any external hints. By the 3rd hearing I could start to make out the words and after 5 repetitions it was completely understandable. And I wasn't particularly trying - it just resolved into a proper sentence all by itself. (As far as I could tell from my top-level awareness; my brain was probably doing some pretty slick processing, but I wasn't putting any conscious "decoding" effort into it.)

I wonder if Brits would pick up on it any faster - the speaker has an English accent, which might have made it a little harder for my American brain to interpret (park = "pahk"). Also, some parts of the other samples I couldn't make out, even after hearing the clear version - it seems like some words are just easier to grok than others (no spoilers, but it's probably a function of the actual sounds rather than how common the word is).

Random Sunday morning AI-babble: would this be one type of test for sentience? The ability to decode something rapidly, without additional input? In other words, on-the-fly pattern matching with extreme distortion/filters? Or can machines do this easily (like in 5 iterations)? Obviously, software was used to generate the distorted sound files, but how easy is it to reverse-engineer the sound if you have no information about it?

Any AI researchers out there?
posted by Quietgal at 10:12 AM on November 16, 2008 [1 favorite]


I understood the first clip perfectly the first time, even to the fact that the speaker was female and that to my American ears she had an accent.

I don't get it.
posted by WolfDaddy at 10:14 AM on November 16, 2008


WolfDaddy:

Most naive listeners hear this as a set of simultaneous whistles, or science fiction sounds

Me and you, see, we're the clever ones.
posted by Beautiful Screaming Lady at 10:22 AM on November 16, 2008 [3 favorites]


PAUL IS DEAD.
posted by Saxon Kane at 10:24 AM on November 16, 2008 [1 favorite]


Very interesting stuff. Thanks.
posted by Bookhouse at 10:24 AM on November 16, 2008


Yeah, on playing the other sounds, I can definitely make out the sentences through the distortion. What's the big mystery?

It's not a mystery you're trying to solve, it's an example of how your brain learns to recognize speech. From the link:

"As you listen to these four examples, you may find that you get better at understanding the sine-wave speech first time around. This is an example of perceptual learning. Having heard several examples of sine-wave speech, your perceptual system has tuned into this form of distortion, so as to be able to perceive new sine-wave speech sentences more clearly."
posted by Solon and Thanks at 10:31 AM on November 16, 2008


Robert Remez has also studied SWS. One study presented here. It's a favorite for comparing the brain's responses to speech and nonspeech, because it's "both". One study by Dehaene-Lambertz, et al, may be available to you here.
posted by knile at 10:33 AM on November 16, 2008


I buried Paul.

Seriously, though, at least I wasn't the only one getting the messages the first time around.
posted by Navelgazer at 10:33 AM on November 16, 2008


As you listen to these four examples, you may find that you get better at understanding the sine-wave speech first time around.

Nah, I understood it was speech the first time it was heard. Didn't hear any chirps or beeps, just distorted speech.
posted by Brandon Blatcher at 10:42 AM on November 16, 2008


I work with people with learning difficulties, many of whom have very idiosyncratic modes of speech. After several years of working with these people i can now understand most of what is said to me and i find it easier to adjust to a new persons mode of speech. I thought it was interesting that new staff member could listen to a request that i had heard as "can we go out" and not understand it at all, i suppose this would explain why.
posted by chelegonian at 10:47 AM on November 16, 2008


Wow. It's a schooner.
posted by Poolio at 10:47 AM on November 16, 2008 [6 favorites]


This phenomenon, perceptual insight, is also probably applicable to musical cognition. Time for some 12-tone torture again.
posted by Gyan at 10:53 AM on November 16, 2008 [2 favorites]


Wow. It's a schooner.

Ha! I was thinking the same thing.
posted by brundlefly at 11:02 AM on November 16, 2008


I thought the first said "It was a funny day and the children were going to the park," on the first listen.

I think being one letter away from the real sentence is pretty good, and that I now deserve a cookie.
posted by aliceinreality at 11:09 AM on November 16, 2008


I just saw Remez give a talk this weekend, and he played an example of SWS and asked if anyone could identify the sentence. Someone immediately shouted "The watchdog gave a warning growl!" I think it must be like Name That Tune for psycholinguists.
posted by rebel_rebel at 11:18 AM on November 16, 2008 [1 favorite]


This VST plugin can be used to produce the same effect on arbitrary audio.

As far as I can tell, it partitions the audio signal into chunks, where each zero-crossing marks a new chunk. Then, it replaces each chunk with a simple wave (saw, triangle, square, sine) of your choice, with the same peak amplitude as the original audio chunk.

The company that originally wrote it is defunct, and it's listed as freeware, so you can probably download to your heart's content.
posted by Jpfed at 11:22 AM on November 16, 2008 [1 favorite]


Wait, they use formant tracking, which is very different. Ignore my silliness.
posted by Jpfed at 11:25 AM on November 16, 2008


I understood most of the first one the first time around -- except for the children bit.
posted by peacheater at 11:25 AM on November 16, 2008


Didn't hear any chirps or beeps, just distorted speech.

Really? I'm not too surprised that folk could make out the speech on first listening to the sine wave clips, but to not hear all the crazy swooping synthesiser-type sounds at all is inconceivable to me and my ears. They sounded like samples from some lost Louis & Bebe Barron soundtrack, until I heard the clear version.
posted by jack_mo at 11:39 AM on November 16, 2008


Seeing a pattern through the distortion seems to be a double-edged skill, one that endows a person with the ability to make sense where there seems to be little but also to make sense where there may be none, such as in pareidolia or apophenia.

Apophenia is the experience of seeing patterns or connections in random or meaningless data. The term was coined in 1958 by Klaus Conrad, who defined it as the "unmotivated seeing of connections" accompanied by a "specific experience of an abnormal meaningfulness".


Pareidolia is a type of apophenia involving the finding of images or sounds in random stimuli.

I think the brain is hard-wired to decode but decoding, like any perception may have a number of variations, such as being accurate, reality based, a misinterpretation or imaginary.

Stimulating and informative post. Thanks TheDonF.
posted by nickyskye at 11:39 AM on November 16, 2008 [1 favorite]


After one or two I was pretty accurate on the rest
posted by jfrancis at 11:58 AM on November 16, 2008


I understood the speech the first time round, but it wasn't until I played the undistorted version I realised I know the woman who is speaking (she's a friend of mine who once worked at the MRC-CBU). I don't know if that makes a difference, although it is kinda weird to think that a bunch of Mefites are listening to her talk.

When I was a poor student I used to get extra cash by being part of the volunteer panel they use for these experiments, and I think I may have done the noise-vocoded speech experiment. Nice to see something I helped with pop up on the blue.
posted by penguinliz at 12:04 PM on November 16, 2008 [1 favorite]


Way cool.
posted by Slack-a-gogo at 12:06 PM on November 16, 2008


ok, guys, quit joking around. there's no difference between the clips at all, and this is a big prank at my expense, right? right?
posted by UbuRoivas at 12:08 PM on November 16, 2008


Nah, I understood it was speech the first time it was heard. Didn't hear any chirps or beeps, just distorted speech.

When it says "chirps or beeps", that is distorted speech.
posted by Solon and Thanks at 12:15 PM on November 16, 2008


Related: Check this out: Mind-reading software developed in the Netherlands can decipher the sounds being spoken to a person, and even who is saying them, from scans of the listener's brain.
posted by nickyskye at 12:32 PM on November 16, 2008 [1 favorite]


Yeah, I was able to understand it before hearing the first 'normal' speech the second time I listened to the track, but not the first time.
posted by delmoi at 1:04 PM on November 16, 2008


I recorded myself saying "Rex! Rex! C'mere Rex! Good boy!", then sine-waved it, and played it back to my dog, and he turned inside-out. So thanks a bunch for that.
posted by turgid dahlia at 2:24 PM on November 16, 2008 [3 favorites]


I played in a french farce where several of the characters had speech inpediments. One character could not pronounce consonants due to a cleft palate, another had an intense lisp (thuppothedly Cathtilian Thpanish). About a third ot the way through the cleft palate character refers to the Spaniard saying, "Who hant huner han a her he hay!" Cracks us all up.
posted by pointilist at 2:57 PM on November 16, 2008


That comment is useless without video.
posted by UbuRoivas at 2:57 PM on November 16, 2008


(i was referring to TD's comment, but it applies equally to pointilist's)
posted by UbuRoivas at 2:59 PM on November 16, 2008


pointilist, I've been rereading that line, "Who hant huner han a her he hay!", a bunch of times and so want to get the joke. Can you do the tedious and translate it into English for me?
posted by nickyskye at 3:05 PM on November 16, 2008


That was interesting. I found that during all of the clips I could understand the last few words quite clearly, but the initial was distorted. I am thinking my brain quickly figured out the code and made it work by the end. Then after I repeated played it, not hearing the clear version, I could make the whole thing out. Fascinating stuff.
posted by Belle O'Cosity at 3:27 PM on November 16, 2008


Who hant huner han a her he hay:

"Who wants butter in a hurf durf way?"

(as spoken by somebody with a mouth full of rich creamery butter)
posted by UbuRoivas at 3:52 PM on November 16, 2008


"Who wants butter in a hurf durf way?"

zomg. So funny! Good one. Oy, just looked it up on Urban Dictionary and realized it's a MeFite in joke.
posted by nickyskye at 5:46 PM on November 16, 2008


So that's how Luke understood R2 so well!
posted by ericbop at 6:40 PM on November 16, 2008 [1 favorite]


this makes me think of other "brainy" stuff. ...Like Robert Krulwich's audio piece on how a human, a rat, and a spider learned to live in space, or how my brain adapted to eyeglasses when I first started wearing them. Brains are crazy things.
posted by thisisdrew at 8:00 PM on November 16, 2008


I actually experienced this when moving to the Bay Area.

The BART train announcements are done using an older speech synthesis application. I couldn't understand them at all on first listen -- they were mind-bogglingly weird sounding. On the second listen, I could understand a snippet or two. But if I heard my partner or a BART driver make the announcement, the machine-intoned version would make perfect sense the next time around. On the other hand, the MUNI Metro system, which uses voice samples for their announcements, was perfectly easy to understand on first listen.

Now, of course, I understand the announcements by second nature. But I always wondered if tourists had as much difficulty as I did understanding what exactly the announcements were saying. Per this, probably... but they're learning.
posted by eschatfische at 8:17 PM on November 16, 2008 [1 favorite]


Sorry.
"You can't understand a word he says!"
After listening to the guy for 20 minutes it was, like, obvious.
posted by pointilist at 8:41 PM on November 16, 2008


It's a schooner!
posted by ooga_booga at 9:09 PM on November 16, 2008


Yeah, I heard the example just fine on the first listen. I do have sensitivity levels bordering on insanity, though.
posted by Burhanistan at 9:16 PM on November 16, 2008


Rd zzss sndnz foist han din weed dur necks wan.

Read this sentence first and then read the next one.

Now go back and read the first sentence. Makes sense now, doesn't it?
posted by twoleftfeet at 9:51 PM on November 16, 2008


I completely understood all of the clips, both distorted and undistorted BEFORE clicking on the link. My inherently superior brain is no match for your cheap tricks, Matt Davis.
posted by ghost of a past number at 12:31 AM on November 17, 2008


You're all liars.

I did get the picture though...
posted by minifigs at 2:53 AM on November 17, 2008


If I ever make an animated clip involving aliens or robots or something, I want to have them speaking plain English Sine Wave. The vast majority of people would think it's intentionally nonsensical, but over time, some might start realizing that the captions actually do match up with what they are hearing. (I think that for extra fun I'd have the captions convey the essence of the spoken text without actually saying the exact same thing.)

I bet that for the people that caught on to this, the first moment of realization would have them questioning their sanity for a few seconds.

"Did I actually hear that? Because for a moment there, I totally understood what that thing said..."

It would be mind-fuck-tastic.
posted by quin at 10:53 AM on November 17, 2008 [1 favorite]


I have never been good at decoding aural input/output. Songs I have heard for years are still gibberish to me until I see the lyrics written down, unless they are very clearly spoken/sung. I also have a hard time decoding speech. It's not a hearing problem, I have been tested, and I hear sound just fine, but there is definitely something lacking in the processing.

I hadn't thought about it much until just now, but when I have trouble hearing speech, and I ask someone to repeat the sentence, much of the time I don't understand even on the second, third, or Lord forbid, the fourth repetition. It's almost as if certain combinations of sounds just don't register. What I hear, when I misunderstand - is that the person speaking is speaking clearly, but the words are nonsense. "Dad, the point of curtains is banana seventy". What? Upon repetition, it's the same nonsense. Very frustrating. My Dad is the same way. We always thought it was just encroaching deafness.

The sine wave speech, in each example to me, was gibberish. After listening to the clear speech, I could go back and hear the patterns hidden in the sine wave speech, but my initial recognition never improved. Gibberish each time.

Interesting.
posted by Xoebe at 11:35 AM on November 17, 2008


Worked exactly as advertised for me, and also for my wife, who was sitting across the room, and had no idea what was going on, until she herd the first clip the second time, and loudly exclaimed her surprise.
posted by paisley henosis at 4:58 PM on November 17, 2008


I heard beeps and whistles, my wife heard words. She is now, officially, better at everything.
posted by patrick rhett at 7:44 PM on November 17, 2008


« Older Restrospect respect   |   Flying Fish Newer »


This thread has been archived and is closed to new comments



Post