Speaking Piano
October 7, 2009 11:32 AM Subscribe

The Speaking Piano, and Transforming Audio to MIDI - Austrian Composer Peter Ablinger has transformed a child speaking so that it can be played as MIDI events on a mechanically-controlled piano, making the piano a kind of speech speaker.
posted by Burhanistan (52 comments total)

This post was deleted for the following reason: Poster's Request -- frimble

so is the original played back over the piano? It seems like that's the case; if so, what's the point of the piano?

It's still pretty cool
posted by Think_Long at 11:35 AM on October 7, 2009

so is the original played back over the piano? It seems like that's the case; if so, what's the point of the piano?

You and I have nothing in common.
posted by phrontist at 11:37 AM on October 7, 2009 [6 favorites]

Ha. Peter Frampton, eat your heart out.
posted by jedicus at 11:41 AM on October 7, 2009 [2 favorites]

transformed a child speaking so that it can be played as MIDI events

That kid's gonna be too cool for school with that 5 pin DIN plug in the back of his head...
posted by CynicalKnight at 11:43 AM on October 7, 2009 [2 favorites]

You and I have nothing in common.

ouch. I was just wondering, I still think it's an awesome project
posted by Think_Long at 11:45 AM on October 7, 2009

The piano voice sounds eerily like the child voice in The Color of Fire by Boards of Canada. They can finally perform it live!
posted by pb at 12:00 PM on October 7, 2009

Think_Long: That was a bit tounge and cheek, and in any case there was no intended value judgement.
posted by phrontist at 12:00 PM on October 7, 2009

So the article suggests that the childs voice is most likely mixed in with the piano during playback...? I'm guessing it would just sound like a cat running across the keys without the vocals played alongside it? But why not just use a MIDI capable keyboard instead of wiring up that piano, that looks like alot of work. cool video.
posted by fatbaq at 12:03 PM on October 7, 2009 [1 favorite]

Try to understand any of it without looking at the subtitles though - and not something you've already seen with the subtitles once, something totally new. I'm not saying the noises it produces are irrelevant to the words they represent - they clearly aren't - but listening to something while watching subtitles of what is supposedly being said creates an illusion that the speech is far clearer than it actually is as your brain "fills in". I couldn't make out a word of it without the subtitles.
posted by nanojath at 12:05 PM on October 7, 2009

Couldn't the Mellotron have done this too?
posted by tommasz at 12:11 PM on October 7, 2009

So the article suggests that the childs voice is most likely mixed in with the piano during playback...?

I'm pretty sure the audio you hear in the video is purely the piano. As far as why he's using a real piano, come on, it's art. You could just have a computer generate a vague synthesis of speech out of a limited range of tones too but nobody would find that particularly interesting I think, "look I've created a poor speech synthesizer that requires input from a live actor to say anything!"
posted by nanojath at 12:14 PM on October 7, 2009

This is going to be creepy, isn't it?
posted by adipocere at 12:16 PM on October 7, 2009

Reminds me of Herbie Hancock (and Tatyana Ali!) on Sesame Street. But that was straight midi, I guess.
posted by MrMoonPie at 12:18 PM on October 7, 2009 [2 favorites]

"Sparky, oh, Sparky ... It is I, your piano..."
posted by raygirvan at 12:20 PM on October 7, 2009

This is not just playing the sound into the piano, it is converting the sound into a flurry of notes, basically a cross between fft resynthesis and granular synthesis via midi controlled servo motors. If I am not mistaken the original input sound is not being replayed at all. If he used OSC rather than midi he could actually get a much more accurate recreation of the sound, because midi is limited to a very low clock speed.
posted by idiopath at 12:28 PM on October 7, 2009 [5 favorites]

> As far as why he's using a real piano, come on, it's art. You could just have a computer generate a vague synthesis of speech out of a limited range of tones too...

We're conditioned to believe any sound can come out of a speaker. Electronics have been capable of human speech synthesis for decades. There's nothing to see, and it'd be difficult to make a new approach seem impressive at all.

The piano makes it amazing because it's an overt demonstration of how close you can get to approximating human speech and voice by striking piano keys the right way.
posted by ardgedee at 12:34 PM on October 7, 2009 [6 favorites]

The piano makes it amazing because it's an overt demonstration of how close you can get to approximating human speech and voice by striking piano keys the right way.

Moreover it's a great way of demonstrating to someone that sounds can be decomposed, like colors.
posted by phrontist at 1:04 PM on October 7, 2009

Reminds me of Herbie Hancock (and Tatyana Ali!) on Sesame Street . But that was straight midi, I guess.

Probably not MIDI at all. Just a then-state-of-the-art sampler. Possibly a Fairlight CMI, judging from the guy playing with the waveform on the big CRT in the back.
posted by The Bellman at 1:06 PM on October 7, 2009

On looking at the first part of the clip, yes. A Fairlight CMI.
posted by The Bellman at 1:07 PM on October 7, 2009

It's slurring its words-- the piano has been drinking.
posted by The White Hat at 1:12 PM on October 7, 2009 [10 favorites]

i think it might be better with a big pipe organ than a piano.
posted by empath at 1:43 PM on October 7, 2009

So the article suggests that the childs voice is most likely mixed in with the piano during playback...?

I'm pretty sure the audio you hear in the video is purely the piano.

1) In the body of the article there's an edit that says:

"Listening again, the short answer to how you can hear so much of the voice through the piano seems to be, you can’t; the original is almost certainly mixed in."

FWIW, I'd say that's absolutely right.

2) If y'all read on down to the comments, you'll see someone saying:

"Here’s a piano only video http://vimeo.com/1483630. Not the same text and with additional notes. Very amazing."

The sound in that video is plainly different to the video in the linked article, further suggesting the sound you hear in the linked article there does have the original mixed in.

Having said all of this: I think it's an awesome piece of work, lovely, fascinating, thanks Burhanistan...
posted by Hartham's Hugging Robots at 1:45 PM on October 7, 2009 [3 favorites]

See also the link posted within that Vimeo video for a full explanation of the work by the artist.
posted by Hartham's Hugging Robots at 1:46 PM on October 7, 2009

There's no way you could generate the siblants (s, f, etc) with just piano keys -- thus the overdubbing. What you have here is the world's most complicated spring reverb unit.
posted by RobotVoodooPower at 1:53 PM on October 7, 2009

Bah, if there is overdubbing this is pointless.
posted by phrontist at 2:17 PM on October 7, 2009

There's no way you could generate the siblants (s, f, etc) with just piano keys

You're right, but what's interesting is the way the temporal resolution, specifically, makes for... a very unique type of sound from a mechanical instrument.
posted by Hartham's Hugging Robots at 2:18 PM on October 7, 2009

Well, now I know how robots are going to speak in my steampunk novel.
posted by No-sword at 2:55 PM on October 7, 2009 [1 favorite]

[♦] Peter Ablinger's web site
[♦] Peter Ablinger performing a piece by Michael Pinter-Koschell
[♦] A brief interview with Peter Ablinger from five years ago, mostly about his cycle Words and Music
[♦♦♦♦] Mark Knoop performs Peter Ablinger: Voices and Piano 1, 2; Instrumente und Rauschen; Akkordeon und Rauschen
posted by koeselitz at 3:04 PM on October 7, 2009

Sorry; [♦] Mark Knoop plays Peter Ablinger's Voices and Piano 1.

My favorite is Ablinger's rendition of the piece by Pinter-Koschell.
posted by koeselitz at 3:07 PM on October 7, 2009

Heeey, speaking of Peter Ablinger: Voices and Piano...metafilter's own speicus will be performing 3 selections from that work in South Pasadena on the 17th. Details are here. I heard "Angela Davis" the other night, and it's pretty sweet.
posted by mandymanwasregistered at 3:47 PM on October 7, 2009 [1 favorite]

Terrifying.
Soon to be appropriated by the makers of the next Friday the 13th movie.
posted by chococat at 4:10 PM on October 7, 2009

Man, I've had an idea similar to this for quite some time. It's good to see someone's done it.
posted by The Great Big Mulp at 4:21 PM on October 7, 2009

Most of you are misinformed in this thread -- the original sound is NOT mixed in with the piano. It's true that some sounds can't be duplicated by the piano -- which is why the words are not audible unless you are reading the text. Your brain fills in the missing information. It's also important to note that this would not be possible with a live performer. The player piano has a "resolution" of about 16 attacks per second, far faster than any human being could perform continuously and accurately. This is necessary because the sound has to at least approach the rate of digital synthesis.

Here is a little more background on this phenomenon of making musical instruments "speak," which he calls "phonorealism." Basically, as I understand it, he's using software to break down the sound into its component parts, i.e. sine waves, then resynthesizing them. But instead of resynthesizing using sine waves, Ablinger does it mechanically with the player piano!

For another (hilarious) example of this, there's A Letter From Schoenberg, in which the famously abrasive composer Arnold Schoenberg recites a letter eviscerating his publisher for violating a contract.

As crazy as this effect is, Ablinger himself seems to think of it as little more than a parlor trick: "My main concern is not the literal reproduction itself but precisely this border-zone between abstract musical structure and the sudden shift into recognition." His other compositions, like the aforementioned Voices and Piano, dwell in this border-zone. In Voices and Piano, a live pianist actually does perform along with the recordings. Obviously a live performer can't play nearly so much nearly so fast, but what's amazing is what aspects of the voice Ablinger chooses to bring out -- they're like little character studies. The music isn't just analytical anymore; it's also incredibly expressive. And people say electronics have no soul! And, the pieces are fun to play.

In conclusion, Ablinger rules.
posted by speicus at 4:30 PM on October 7, 2009 [5 favorites]

This is similar -- Sine Wave Speech.

Also:

…a head, an intricately worked bust, cloisonné over platinum, studded with seed pearls and lapis. ..The thing was a computer terminal…it could talk. And not in a synth-voice, but with a beautiful arrangement of gears and miniature organ pipes. It was a baroque thing for anyone to have constructed, a perverse thing, because synth-voice chips cost next to nothing…Smith jacked the head into his computer and listened as the melodious, inhuman voice piped the figures of last year's tax return.
posted by empath at 5:10 PM on October 7, 2009

Is it just me, or does that kid have an excellent accent?
posted by matkline at 5:13 PM on October 7, 2009

I concur with Speicus; the piano, by itself, is replicating the voice of the kid, by means of re-creating each tiny slice (aka "sample") of the kid's voiced audio with a precisely played note or combination of notes on the piano.

There is no remixing or overdubbing; the instrument is being manipulated in such a way to (more or less) replicate the sound of human speech. Very elegant and sweet piano hack.
posted by Aquaman at 6:07 PM on October 7, 2009

speicus: Most of you are misinformed in this thread -- the original sound is NOT mixed in with the piano. It's true that some sounds can't be duplicated by the piano -- which is why the words are not audible unless you are reading the text. Your brain fills in the missing information. It's also important to note that this would not be possible with a live performer. The player piano has a "resolution" of about 16 attacks per second, far faster than any human being could perform continuously and accurately. This is necessary because the sound has to at least approach the rate of digital synthesis.

Hey, now. If anyone was 'misinformed,' it was the author of the article linked, who claims that this couldn't possibly be piano alone. If you have an issue, take it up with him.
posted by koeselitz at 6:31 PM on October 7, 2009

SHUT UP AND LET US HEAR IT GODDAMN I HATE SELF-IMPORTANT "NEWS" ORGANIZATIONS
posted by DU at 6:35 PM on October 7, 2009 [2 favorites]

A Letter From Schoenberg,

The reverb on that video makes it VERY difficult to hear, but even so, I could pick out the words in the last 3rd of that, reading along with the text.
posted by empath at 7:04 PM on October 7, 2009

koeselitz: Hey, now. If anyone was 'misinformed,' it was the author of the article linked, who claims that this couldn't possibly be piano alone. If you have an issue, take it up with him.

I should probably comment over there too, you're right. But in my defense, all I really mean by "misinformed" was that you'd been given bad information (by the author who was himself misinformed, as you pointed out). I just wanted to make sure the misinformation didn't spread any further, and cast any more doubts on Ablinger's achievement.

I realize, though, that in the context of the internet, "misinformed" often means "YOU ARE WRONG AND CAN THEREBY SUCK IT," and I probably should have phrased things a little differently.

Hey look! Another video, without commentary this time.
posted by speicus at 7:06 PM on October 7, 2009 [3 favorites]

That's fair, speicus - sorry I took that a little harsher than I'm sure you meant it.

By the way: I'm certain you're correct. I'm too lazy to go to the trouble of uploading it, but I downloaded that youtube video of the child's-voice piece and slowed it down a whole bunch in VLC Player - there is clearly no smooth backing track, only the piano making the tones. What's interesting is that this works much better with the child than it does with the Schoenberg piece - I believe this is partially because the piano is place in such an echo-filled room for the Schoenberg piece, but I think it's also because a child's voice seems to be easier to replicate, being on a higher and perhaps narrower frequency.
posted by koeselitz at 7:21 PM on October 7, 2009

This is a much better version of the Schoenberg than the vimeo clip that mostly sounds like room.
posted by xorry at 8:03 PM on October 7, 2009

We're so used to listening to music and audio so distorted by the compression used to fit it on our MP3 players, or for streaming it over the internet, that we're quite able to accept that we must be hearing some of the original track. I specifically remember cringing at how WMA compression would transform the sibilants on my music tracks into a tinnitus-like ringing very much like the rat-a-tat-tatting of that rightmost key.

In fact, solve the portability issue and add a Click Wheel, and they just might be on to something!
posted by theDTs at 11:29 PM on October 7, 2009 [1 favorite]

Btw, this same method is the way that some compressed audio works, like g729 (if you use voip).

If you've ever noticed that audio books from itunes or audible.com sound 'funny', that's why-- it's also vocoded. It offers huge compression gains for a cost of sounding slightly dehumanized. and it's how they can fit 2 hours of audio into a 35 meg file.
posted by empath at 11:37 PM on October 7, 2009

Excellent post. Thank you.
posted by Blazecock Pileon at 12:31 AM on October 8, 2009

I wrote a shitload about this earlier today, but it was all on Facebook. I thought this was really amazing, and wrote a lot of words about how I believe it works, why it's so neat, etc. I'm gonna repost some of it here, if that's not cool, then mods feel free to baleet it.

Explaining what I'm pretty sure it's doing:
Or, here's a better explanation: you know how a piano, tuned as it is by ratios/formulae/whatever (equal temperament tuning, i.e. basically all western instruments), one note will be X Hz (hertz), and the next note will be X + Y Hz, and so on, and it's a significant leap between each one? This takes a more complex sound like speech, and sort of "rounds" out all those "between-the-notes" frequencies down to only the frequencies that comprise equal tempered tuning.
10 hours ago · Delete

Oh, also, in MIDI each note event number (0-255, or 8 bits) maps to a frequency/note in the equal tempered tuning frequency series (most higher end synths let you change the frequency table for, say, Indian tuning, microtones, etc.), so then it's easy to generate MIDI note events. I'm pretty sure that's what it's doing.

And more, featuring some explanation of synthesis and MIDI in general and how it relates to all this:
OK, so this is following on from the post about the speech-to-MIDI thing. First of all, if you look at, say, a single piano note's waveform in an audio editor, it's pretty simple: a little noise from the percussive hammer strike, then a more or less constant frequency. In ye olde tymes, when there were only analog synths, they were based on a simple oscillator. This oscillator can produce any frequency within its range in the form of a simple waveform (square, tri, saw, sine, pulse, etc.). Then you filter out overtones, etc. etc. producing a more complex, yet still pretty simple sound. The frequency-to-scale mapping was done by what was called control voltage, and every manufacturer had a different system, and it doesn't matter here how it works anyway.

MIDI, on the other hand, works like this: you hit a key. It sends a "note on" event, with a channel (irrelevant here what a "channel" is) number and a note number from 0-255. The synth, which by the time MIDI arrived would have a microprocessor, does a table lookup on that note number, and it corresponds to a frequency. The table, unless it's been altered (Indian tuning, etc.), will consist of column A, 0-255, and column B, the frequencies of the Western (equal tempered) scale. How an audible note actually gets played depends on the synth, but that's how the process starts.

Most sounds musical instruments make, in terms of the frequencies they're made up of, are quite simple. Bells make probably the most complex waveforms of the common instruments, and couldn't be synthesized accurately until FM came along (mid 80s). Speech, on the other hand, is extremely complex by comparison. This is why it's only now that speech synthesizers sound only sort of almost natural. So you've got this extremely complex sound made up of many frequencies. The speech-to-MIDI thing "simplifies" it in the way I already explained, leaving only the equal tempered note frequencies. Here, the PureData patch is what is mapping the remaining frequencies to MIDI note numbers, and the way a note is played is that the PD patch, on a computer, outputs MIDI note on/off events, and the piano is equipped with an electronic/mechanical system which strikes the corresponding piano string to that MIDI note. You could also use it to send note events to a common synthesizer or anything that accepts MIDI (except a drum machine type instrument, which works a bit differently).

I may have oversimplified/poorly explained some of this, particularly the "table lookup" bit, but I was really trying to explain why this is so awesome, and the audience was a group of my friends who don't necessarily have much interest in synthesis/electronic music/etc., at least not like I do. But yes, this is really, really awesome, I think.

I also could be totally wrong about how the thing works, so please tell me if I am. I want to understand this as best I can, but the math at least is certainly beyond me.
posted by DecemberBoy at 12:57 AM on October 8, 2009 [1 favorite]

Oh, and you could technically use it with a drum machine/sampler, I guess. I just didn't want to go into the way MIDI works with samplers/drum machines (very briefly, note events trigger a certain drum sample on a drum machine, and a sampler can do the same or pitch a single sample up/down, instead of the whole "note frequency in Hz via table" thing).
posted by DecemberBoy at 1:04 AM on October 8, 2009

I just saw this. This is pretty remarkable. I've never heard of Ablinger and will be checking out more of his work. This particular thing reminds me a little of certain works of Clarence Barlow (or however he spells his name these days.)
posted by ob at 3:12 AM on October 8, 2009

I couldn't make out a word of it without the subtitles.

Thus showing once again that the drum (as in West African talking drum) is more sophisticated than the piano.

note to phronitist: that's tongue IN cheek.
posted by flapjax at midnite at 4:26 AM on October 8, 2009

DecemberBoy: great explanation except for a minor quibble: MIDI is not nearly so reasonable in implementation as you make it sound on the bit level, most (all?) values are 7 bit, or multiples of 7 bits in size. A midi note value of 255 gives you a frequency of approximately 20,390,000 hz, which is over a thousand times too high for a female teenager with undamaged hearing to perceive, not to mention your typical adult.
posted by idiopath at 12:12 PM on October 8, 2009

Yeah, I freely admit to not fully grasping the MIDI spec. You gotta admit it's pretty damn arcane, though. It was basically one of those consensuses among manufacturers (I think Sequential Circuits was behind a lot of it, right?) that was a "consensus" in that no one hated it TOO much.
posted by DecemberBoy at 2:41 PM on October 8, 2009

OK, so: inspired by this lively discussion, here is my attempt to recreate this effect, for what it's worth. It doesn't sound nearly as good as Ablinger's version, but I had fun trying.
posted by speicus at 10:33 AM on October 9, 2009

« Older This will bring giant tears to your eyes. | With Glowing Hearts Newer »

This thread has been archived and is closed to new comments

Speaking Piano October 7, 2009 11:32 AM Subscribe

Speaking Piano
October 7, 2009 11:32 AM Subscribe