July 17, 2015 1:18 PM   Subscribe

How does Shazam recognize music? Christophe Kalenzaga sifts through an old research paper (pdf) by Shazam's founder and conducts a short (written) course in signal processing, acoustics, Fourier transformations, and fingerprinting music.

I’ll start with the basics of music theory, present some signal processing stuff and end with the mechanisms behind Shazam. You don’t need any knowledge to read this article but since it involves computer science and mathematics it’s better to have a good scientific background (especially for the last parts). If you already know what the words “octaves”, “frequencies”, “sampling” and “spectral leakage” mean you can skip the first parts.
posted by jquinby (13 comments total) 39 users marked this as a favorite
Gah! Forgot to credit the dude whose tweet pointed me to the article!
posted by jquinby at 1:30 PM on July 17, 2015 [1 favorite]

Let me guess: Mel-frequency cepstrum coefficients feature somewhere?
posted by acb at 1:31 PM on July 17, 2015 [1 favorite]

The info is pretty good, and it's an excellent end to end overview.

My one quibble is that a bunch of music theory is presented that is totally irrelevant to the algorithm, and then the music theory terms are used later in totally inappropriate ways. Shazam doesn't need a sound to have a pitch in order to fingerprint it, and FFT bins do not and cannot encode "notes", only frequencies. The article could be shorter, simpler, and more accurate if most everything about music theory were taken out, and replaced with the simple statement "musical notes, which have pitches, are a complex abstraction that is irrelevant to audio fingerprinting, and they are made of much simpler things called frequencies which...".
posted by idiopath at 1:37 PM on July 17, 2015 [6 favorites]

I mean it's like bringing up vanishing points in an article on JPEG encoding - they are interesting to the artists, but totally meaningless as far as the algorithm is concerned.
posted by idiopath at 1:40 PM on July 17, 2015

I've always assumed it's black magic.

I've had Shazam catch stuff in under 5 seconds that was playing on the car stereo at moderate volume, driving on the freeway with the windows open and the radio DJ talking over it.
posted by Hairy Lobster at 2:01 PM on July 17, 2015 [2 favorites]

I went to a talk by a computer scientist who was doing work on this kind of thing, mostly trying to write recommendation engines. He said that you could get surprisingly far without paying any attention to pitch, just fingerprinting rhythmic information and amplitude and serving up recommendations based on similarities in those. Actually I think that was almost all that he was using.
posted by thelonius at 2:05 PM on July 17, 2015 [3 favorites]

thelonious: not using pitch is normal - I don't think many if any fingerprinters are using pitch. Not using frequency is a bit odd though. Frequency data is much easier to pick out of a noisy environment.
posted by idiopath at 2:18 PM on July 17, 2015

This was at JavaOne 10 years ago; I wish I could remember the details better
posted by thelonius at 2:33 PM on July 17, 2015

I spent serious seconds thinking that this would involve Final Crisis.
posted by One Hand Slowclapping at 3:42 PM on July 17, 2015

Well, how ever Shazam does its thing, it does it well. With it, my friend and I managed to win a local radio's contest wherein you guess a song's title after hearing a part of it. And it was way faster than googling fragments of lyrics.
posted by 30thdegree at 4:29 PM on July 17, 2015 [1 favorite]

So a few months back I finally figured out how to use a midi keyboard controller and a computer to make real-time live music (long story short- garageband on a mac, nothing on a PC ever gave me decent results) and shortly after I came up with a game. I loaded up shazam, let it sit in the tray and tried to recreate the intros to various pop songs using patches and settings as close I could get them to the originals, to see if I could get it to confuse me for the real thing.

Turns out you cant. I'd gotten to "hell I can't tell a difference" but never once did I get that system alert popup letting me know that I'd fooled the man behind the curtain inside the black box.But- in the process of playing this unwinnable game in my spare time I did learn something interesting: If you load up any of the Hammond B3 patches that come with garageband and start doodling around, Shazam will go crazy and just start throwing out guesses left and right. I'm guessing that the programmers did a pretty decent job of recreating the crazy harmonics that define the instrument and it just overwhelms the whole "fingerprinting" voodoo that determines what makes up a particular song around "step 3" or so of "storing fingerprints" in the linked article.
posted by mcrandello at 5:00 PM on July 17, 2015 [10 favorites]

He said that you could get surprisingly far without paying any attention to pitch, just fingerprinting rhythmic information and amplitude and serving up recommendations based on similarities in those.

I can see how "mapping" transients and the distance between them could work. In fact, it's a similar concept to how the CDDB works (fingerprinting track length, though. No ad use audio is used for that ).
posted by sourwookie at 8:03 PM on July 17, 2015

I'm sorry I can't get past the misuse of the word "decrypted" right at the start of the article. Makes me think of the way quantum physics is abused by people who like to shake up water to cure cancer.
posted by selfish at 5:37 AM on July 18, 2015 [1 favorite]

« Older The best game in the pug dating simulator genre   |   "I don't have that experience with many straight... Newer »

This thread has been archived and is closed to new comments