I Γβ“œ s1𝕋𝕋𝓲6 i𝔫 tβ“—αΊΎ β„³oπ•½ΗΉβ’Ύπ•ŸπŸ„Ά a𝒯 β’―h𝔼 π’Ÿβ…ˆπ“β’Ίπ‘ 0𝗑 Ⓣℋe π—–α»šπŸ†πš—αΈšβ„œ
February 19, 2015 11:28 AM   Subscribe

The Ghost in the MP3 [warning, flashing imagery] — "moDernisT" was created by salvaging the sounds and images lost to compression via the MP3 and MP4 codecs. The audio is comprised of lost mp3 compression material from the song "Tom's Diner" famously used as one of the main controls in the listening tests to develop the MP3 encoding algorithm. Here we find the form of the song intact, but the details are just remnants of the original. The video is the MP4 ghost of a corresponding video created in collaboration with Takahiro Suzuki. Thus, both audio and video are the "ghosts" of their respective compression codecs.
posted by tonycpsu (23 comments total) 44 users marked this as a favorite
 
Scary digital hair and nail clippings.

The solid breath intakes freaked me the hell out but not as much as the ones at :50 and 1:48.
posted by Buttons Bellbottom at 11:40 AM on February 19, 2015 [1 favorite]


This is the music that The Caretaker might make 50 years into the future.
posted by a lungful of dragon at 11:42 AM on February 19, 2015 [2 favorites]


This is awesome. Thanks for posting it.
posted by smidgen at 11:50 AM on February 19, 2015


Very cool. What I don't understand though is how the sound in that is equivalent to 90% of a wav file if the mp3 is at 128, or maybe 80% if at 256? Or maybe I just don't understand compression at all. Is the data chopped mainly stuff at frequencies we cannot hear?
posted by wyndham at 12:16 PM on February 19, 2015


Awesome.
posted by spitbull at 12:18 PM on February 19, 2015


It's not that 90% of the sound is chopped. 90% of the data is chopped. That's done by strategically removing a little data so that what's left is easily compressible.

Here's some sort of analogy. Let's say you had a string of letters:

ABCDEFGXHIJKLMXNOPQRSTUVWXYZ

The best way to make it "easily compressible" would be to remove those 2 Xs. Then you could compress the rest to "alphabet". So you lost only 2 characters out of 28 accuracy wise, but by sacrificing those 2, you've compressed that original 28 string down to 8 characters.
posted by Hubajube at 12:32 PM on February 19, 2015 [5 favorites]


Using the python library headspace, and a reverb model of a small diner, I began to construct a virtual 3-d space. Beginning by fragmenting and scrambling the more transient material, I applied head related transfer functions to simulate the background conversation one might hear in a diner. Tracking the amplitude of the original melody in the verse, I applied a loose amplitude envelope to these signals. Thus, a remnant of the original vocal line comes through in its amplitude contour.
I can't say I follow all that, or that I understand, or can guess, what the artist is doing in any detailed way. But it doesn't sound quite as simple as an algorithmic reconstruction of the sounds that an MP3 codec "loses." There's a lot of human meddling/creativity to make it spookier.

A straight-up "difference" track between an original lossless recording and an MP3 would probably be less interesting.
posted by Western Infidels at 12:35 PM on February 19, 2015 [16 favorites]


Should be "The Ghost Left Behind by the MP3" or "Shadow of the MP3."
posted by straight at 12:42 PM on February 19, 2015


And I might point out that the audio you hear in the moDernisT project, and all of the audio you hear in that YouTube video I linked (the lossless, lossy, and difference tracks), is itself put through an additional layer of lossy compression for transmission over the internet. You're hearing an algorithm's perceptual model approximation of the residue of another algorithm's perceptual model approximation. Stuff that's being "recovered" may just get tossed again, because you're not going to hear it anyway.
posted by Western Infidels at 12:42 PM on February 19, 2015 [3 favorites]


A straight-up "difference" track between an original lossless recording and an MP3 would probably be less interesting.

And even that example was normalized. The actual difference is considerably less than what we heard.

The video in the OP sounds like noise reduction might have been applied to the difference, or some other things were going on that were exaggerating artifacts in some way.
posted by Foosnark at 1:37 PM on February 19, 2015 [1 favorite]


I love this.
posted by Annika Cicada at 1:44 PM on February 19, 2015 [1 favorite]


Is the data chopped mainly stuff at frequencies we cannot hear?

Frequencies you can't hear ever are barely making it into the digital domain at all. Lossy compression is sort of about taking out details that won't be noticed in context.
posted by atoxyl at 2:09 PM on February 19, 2015


Anyway, it succeeeds as a cool thing to listen to. It shouldn't be interpreted as a demonstration of how lossy compression "ruins" music.
posted by Foosnark at 2:19 PM on February 19, 2015 [2 favorites]


Yeah, definitely a neat thing even if as much for the artistic flair as for the literal data they're working with. And reminds me if only incidentally of my own weird Tom's Diner experiment from a few years ago, using Echo Nest's Remix API to automatically make it monotone.
posted by cortex at 2:43 PM on February 19, 2015 [4 favorites]


For those who want to see what was actually done, there's a reasonably detailed paper buried very deep in the "project info" page.

This is very cool. Definitely art inspired by data rather than data analysis, and the artist's statements about how this demonstrates the failure of the mp3 perceptual model seem hard to justify, but the process and the result are both great fun.
posted by eotvos at 2:45 PM on February 19, 2015


So that's what I've been missing all along. I played this track in parallel with my MP3 of Tom's diner and my experience of the music was so much richer... {/}
posted by RedOrGreen at 3:05 PM on February 19, 2015


> And reminds me if only incidentally of my own weird Tom's Diner experiment from a few years ago, using Echo Nest's Remix API to automatically make it monotone.

That is freaking me the fuck out.
posted by benito.strauss at 3:29 PM on February 19, 2015 [1 favorite]


Can we get a migraine and epilepsy warning on that, mods?
posted by poe at 3:37 PM on February 19, 2015


That is freaking me the fuck out.

There's a bunch more where that came from.

Can we get a migraine and epilepsy warning on that, mods?

Sure, added a note after the link.
posted by cortex at 3:42 PM on February 19, 2015 [4 favorites]


Cortex, is the code for your monotoning experiments available somewhere?
posted by jjwiseman at 4:21 PM on February 19, 2015


Digging the hell out of monoTom's diner, cortex.
posted by edheil at 4:53 PM on February 19, 2015


Not formally, but I'd be happy to send you a copy. Warning, though, I just dug it up to play with and it's segfaulting on me. The Remix API has been through a major version release since I last played with it I think, so no real big surprise there. My code as such is really only about twenty lines of pretty basic Python, and I'm wondering if it'd be easier for me to just rewrite it from scratch against the current API docs.

It's echonest doing all the heavy lifting and I heartily recommend giving it a go if you like the idea of doing weird automagical music manipulation stuff and aren't afraid of a little monkeying around. I've enjoyed it in the past and don't know a lick of Python, so hey.
posted by cortex at 4:55 PM on February 19, 2015


Sounds lovely, although I was amazed when I thought it was a delta between the uncompressed and the compressed versions and rather sad (but relieved - had I really got my mental model of MP3 compression that wrong?) that it wasn't.

Headspace sounds rather good. Does that mean I can do my own I Am Sitting In A Room on my PC with a bit of coding?
posted by Devonian at 7:28 PM on February 19, 2015


« Older Stephen Fry on language, philosophy, religion...   |   The Mysterious Shadows of Skullshadow Island Newer »


This thread has been archived and is closed to new comments