Iron Pyrite Ears
March 6, 2012 12:11 PM   Subscribe

Articles last month revealed that musician Neil Young and Apple's Steve Jobs discussed offering digital music downloads of 'uncompromised studio quality'. Much of the press and user commentary was particularly enthusiastic about the prospect of uncompressed 24 bit 192kHz downloads. 24/192 featured prominently in my own conversations with Mr. Young's group several months ago. Unfortunately, there is no point to distributing music in 24-bit/192kHz format. Its playback fidelity is slightly inferior to 16/44.1 or 16/48, and it takes up 6 times the space.
posted by Sebmojo (322 comments total) 54 users marked this as a favorite
 
Great article. I am the choir, so I needed about a sentence to convince me, but the technical details are great. Thanks!

I was also interested in what motivated high-rate digital audio advocacy

Me too, but I'm still not sure.
posted by mrgrimm at 12:20 PM on March 6, 2012 [2 favorites]


Neil Young's older stuff sounds better on vinyl and his newer stuff sounds best through a stereo that's about to blow its speakers out. Why the hell would he care about higher quality downloads?

[Note: Neil Young fan]
posted by The 10th Regiment of Foot at 12:21 PM on March 6, 2012 [5 favorites]


I was also interested in what motivated high-rate digital audio advocacy

Selling you the same thing, again, at a premium cost.
posted by wcfields at 12:21 PM on March 6, 2012 [12 favorites]


Its playback fidelity is slightly inferior to 16/44.1 or 16/48

They're obviously not using the right kind of wood for their volume control knobs.
posted by rocket88 at 12:22 PM on March 6, 2012 [49 favorites]


But the numbers are bigger!
posted by Threeway Handshake at 12:23 PM on March 6, 2012 [6 favorites]


This amazing. Looking forward to getting into this in depth later..
posted by empath at 12:26 PM on March 6, 2012


I'd be happy if they would just distribute CD-grade audio as FLAC files. I can compress it to mp3 if I need to, and enjoy the higher fidelity if I don't
posted by bitmage at 12:27 PM on March 6, 2012 [6 favorites]


What motivates a man to build an amp that goes to 11? BECAUSE HE CAN.
posted by aaronetc at 12:28 PM on March 6, 2012


How many people actually have access to 24/192 DACs, anyway? I know the ones in my DVD player and associated gear won't go there. Does the average modern PC have such DACs? Does an iPod?

Or are these standards only applicable to a few hobbyists with hundreds of dollars to spend on specialty equipment?
posted by Western Infidels at 12:30 PM on March 6, 2012


I like the idea of stuff getting released in that format because it makes remixing easier. That's really the only value to it. It would be even better if they released everything as 24/192kbps stems.
posted by empath at 12:32 PM on March 6, 2012 [2 favorites]


How many people actually have access to 24/192 DACs, anyway?

Pro music producers.
This is a lot like people demanding 4K video DVDs in a world where only 720p TVs existed.
posted by Threeway Handshake at 12:35 PM on March 6, 2012 [1 favorite]


Neil Young's older stuff sounds better on vinyl and his newer stuff sounds best through a stereo that's about to blow its speakers out.

You may not care, but Neil Young is incredibly picky about the quality of his recordings -- he's right up there with Tom Sholz and Becker/Fagan in this.

He's not stupid, he knows that what most people listen to his music on won't be very faithful to the original recording, so he tests masters on regular stereos before release, but he knows that a few people, himself included, do have very high-fidelity systems, and he wants to make sure that you can reproduce his music as accurately as he is able to record it.

Okay, he won't go and rebuild a mixing desk like Sholz will -- but he'll ask someone to do it for him if he thinks there's a way to get a better recording from it.

I don't know if 44.1KHz/16bit is perfect -- but it's really close. Really, the only thing it is lacking is for the .00001% that can hear over 22KHz (not that anyone's trying hard to record that, mind you...) and it is has slightly less dynamic range than ideal human hearing. I suspect 48KHz/17bit would more than cover the entire theoretical human audio spectrum, but with most processing systems, this would require packing the sample into a 24 or 32 bit word, which would be a big waste.

I'm not willing to throw out the perfectly good CD-Audio standard for it.

I'd be happy if they would just distribute CD-grade audio as FLAC files.

Agreed. Lossy compression system can do far more damage to the quality of the recording, though they have gotten much, much better over the years. Having the data stored with lossless compression avoid that, and as you said, you can spin off an MP3 or AAC from that copy for your portable with limited storage -- and limited earbuds.
posted by eriko at 12:35 PM on March 6, 2012 [3 favorites]


what motivated high-rate digital audio advocacy
/*
 * These go to 0x11.
*/
posted by CynicalKnight at 12:38 PM on March 6, 2012 [5 favorites]


but Neil Young is incredibly picky about the quality of his recordings
Pfft. He plays a guitar with a broken pickup!
posted by Threeway Handshake at 12:38 PM on March 6, 2012 [3 favorites]


He's not stupid, he knows that what most people listen to his music on won't be very faithful to the original recording, so he tests masters on regular stereos before release, but he knows that a few people, himself included, do have very high-fidelity systems, and he wants to make sure that you can reproduce his music as accurately as he is able to record it.

More Barn!
posted by hal9k at 12:39 PM on March 6, 2012 [9 favorites]


Really, the only thing it is lacking is for the .00001% that can hear over 22KHz

You can almost guarantee that most record producers can't.
posted by empath at 12:39 PM on March 6, 2012 [3 favorites]


Great article, thanks for posting.
posted by Outlawyr at 12:47 PM on March 6, 2012


I'd better invent some kind of motorized head-slapping machine, in case I ever meet someone who's advocating for "wide spectrum video."

Great article (according to this choir member)
posted by ShutterBun at 12:48 PM on March 6, 2012


most speakers can't reproduce higher than 17KHz. Having a 192KHz sample rate has benefits beyond reproducing 96KHz within the nyquist theorum. It would capture the sound within the human hearing range much more accurately.
posted by DbanksDog27 at 12:49 PM on March 6, 2012 [1 favorite]


That's a great article, but as others have said, I'm the choir.
posted by sfred at 12:54 PM on March 6, 2012


Higher resolutions than 20K can be heard in the phase relationships between two channels, as airy space-filling sound if it's produced with that info present.

And OMG you'd need some kind of super expensive science project to hear it anyway!

How about these Vifa tweeters and this Dayton chip amp, together those will set you back $100 and get you high frequencies up around 30 KHZ.

The 1% haters argument is vapid. An Mbox II pro will get you the DACs. It's not rocket science.
posted by StickyCarpet at 12:55 PM on March 6, 2012 [3 favorites]


The one thing that bugged me is the comment that you can't hear lightbulbs: I can hear old incandescent bulbs on a pretty regular basis, vacuum tubes most of the time (I used to be able to hear them all the time as a kid; I couldn't go into certain rooms at the CBC broadcast center without getting a headache due to the dozens of CRTs humming). Either I have crazy ears or that isn't a great example.
Or the fact I have ADHD means I don't block out sounds that most people hear but ignore.
posted by Canageek at 12:59 PM on March 6, 2012


Higher resolutions than 20K can be heard in the phase relationships between two channels, as airy space-filling sound if it's produced with that info present.

I understand all the words, but together they confuse me. Could you explain?
posted by Sebmojo at 12:59 PM on March 6, 2012 [2 favorites]


Higher resolutions than 20K can be heard in the phase relationships between two channels,

That sounds to me like simply an artifact of two intersecting wave patterns producing an audible harmonic. Is it beneficial somehow?
posted by ShutterBun at 1:03 PM on March 6, 2012 [1 favorite]


They're obviously not using the right kind of wood for their volume control knobs

Wood? Pffffff. If you're not using dehydrated yak dung volume control knobs you clearly just don't like music very much.
posted by yoink at 1:04 PM on March 6, 2012 [1 favorite]


They make a good point that 24 bit/192 Khz is more than anyone needs, but the key question should be is CD quality (16 bit /44.1KHz) high enough?
Perhaps we should be aiming for a compromise; something like 20 Bit / 96 Khz

Also if record producers would at least start to use the dynamic range thats already available I think that would make a good start!

How many people actually have access to 24/192 DACs, anyway?

Linn Records already offer some downloads in 24 bit/192 Khz FLAC and Windows media format - samples here so you can check your system compatibility for the future. (If you don't have a Linn HiFi they probably won't work)
posted by Lanark at 1:05 PM on March 6, 2012


That sounds to me like simply an artifact of two intersecting wave patterns producing an audible harmonic. Is it beneficial somehow?


Human beings are binaural. Phase differences between signals arriving simultaneously at different ears are what allow us to spatially locate sounds and construct a stereo image. A 40Hz signal arriving 10 milliseconds later at the left ear than the right ear doesn't have much of a phase difference, but a 20 kHz signal arriving 10 ms later has a HUGE phase difference. As a result the stereo image is much more detailed at higher frequencies. This is why it doesn't much matter where you put your subwoofer, but the placement of your tweeters is very critical.

It's got nothing to do with upper harmonics.
posted by unSane at 1:09 PM on March 6, 2012 [1 favorite]


It's also why record producers usually place bass dead center in the mix...
posted by empath at 1:12 PM on March 6, 2012


Hear hear!

Or don't.
posted by spitbull at 1:12 PM on March 6, 2012


It's got nothing to do with upper harmonics.

I think what he is saying is that you have two ultrasound frequencies beating, then you'll get an audible lower frequency sound, but I'm not 100% sure that's a good thing...
posted by empath at 1:15 PM on March 6, 2012


It's got nothing to do with upper harmonics.

Right, but what about hearing "higher resolutions than 20k" due to the phase relationships between the two channels?

If I'm presented with two inaudible frequencies,and their phases are adjusted such that I am suddenly able to hear something, it's got to be an artifact, no?
posted by ShutterBun at 1:15 PM on March 6, 2012 [1 favorite]


It's also why record producers usually place bass dead center in the mix...


Not only that but in mastering the bass frequencies are often split into middle (L+R) and side (L-R) components and the side components discarded, while the upper frequencies have the side components exagerrated to produce a wider stereo image. This Brainworx plugin is the go-to and it's an amazing tool.
posted by unSane at 1:17 PM on March 6, 2012


It's also why record producers usually place bass dead center in the mix.

Actually that has to be done on vinyl masters to stop the needle jumping out of the groove. If a heavy base note throws the needle too far out to one side the record becomes unplayable.
posted by Lanark at 1:20 PM on March 6, 2012 [3 favorites]


In addition to detecting location of low frequencies and mid-side mastering, there was also the consideration that, for vinyl, the more stereo separation you have at lower frequencies (at least at very high amplitudes) you increased the likelihood of jumping the needle out of the groove.
posted by chimaera at 1:21 PM on March 6, 2012 [1 favorite]


I stopped reading this well written tripe after the following statement in the piece:

All signal content under the Nyquist frequency (half the sampling rate) is captured perfectly and completely by sampling; infinity is not required. The sampled signal contains all of the information in the original analog signal, and the analog signal can be reconstructed losslessly. Sampling does not affect frequency response. Sampling is also completely phase neutral.

Wow that's some serious bullshit.
posted by sydnius at 1:22 PM on March 6, 2012 [3 favorites]


Um, what Lanark said.
posted by chimaera at 1:22 PM on March 6, 2012


Not only that but in mastering the bass frequencies are often split into middle (L+R) and side (L-R) components and the side components discarded, while the upper frequencies have the side components exagerrated to produce a wider stereo image.

Interesting.. I'm going to try throwing a filter delay on the bass on the track I'm working on now and see what happens..

Is there a waves plugin that does this?
posted by empath at 1:22 PM on March 6, 2012


the key question should be is CD quality (16 bit /44.1KHz) high enough?

The article answers that question with a resounding and detailed "yes."
posted by yoink at 1:22 PM on March 6, 2012


No one can see X-rays (or infrared, or ultraviolet, or microwaves). It doesn't matter how much a person believes he can. Retinas simply don't have the sensory hardware.

Actually, everyone's eyes are sensitive to ultraviolet light. It's just blocked by the lens. People who see it report it as looking like lilac.
posted by euphorb at 1:22 PM on March 6, 2012 [2 favorites]



It's also why record producers usually place bass dead center in the mix...


It's because they can't afford two subwoofers for stereo.
posted by mikelieman at 1:24 PM on March 6, 2012 [1 favorite]


Wow that's some serious bullshit.

The word "perfect" being used to refer to any A/D or D/A conversion process is...problematic.
posted by MillMan at 1:25 PM on March 6, 2012


How many people actually have access to 24/192 DACs, anyway?

I was able to play 192kHz/24bit flac files on a gen 5.5 iPod under Rockbox. So the chipsets even back then had support.

A rip of a good vinyl master of Meddle, IIRC...
posted by mikelieman at 1:27 PM on March 6, 2012


The word "perfect" being used to refer to any A/D or D/A conversion process is...problematic.

Or any A/A conversion.
posted by Threeway Handshake at 1:27 PM on March 6, 2012 [2 favorites]


I think what he is saying is that you have two ultrasound frequencies beating, then you'll get an audible lower frequency sound, but I'm not 100% sure that's a good thing...

There are anti-aliasing filters put in place at 1/2 the frequency sample rate that deal with this issue in all audio recording, hence why (nominally) the human ear's response is 20hz to 20,000hz, and the standard is 44,100hz

Having done extensive listening with both sample rate and bit depth I would agree that the benefits of going to a higher sample rate are very very diminishing past 44.1 or 48, they are, however discernable.

I would have to disagree strongly about the bit depth though. When I was working as a recording engineer I found (and many of my coworkers found) that the difference between 44.1/16 and 44.1/24 IS noticeable. The depth (read subtlety of dynamics) increases and you can hear details in 24 bit that you cannot hear in 16 bit.

That said, the average listener won't notice it. But I strongly disagree that as a statement that it's not within human ability to hear it.
posted by aloiv2 at 1:27 PM on March 6, 2012 [3 favorites]


I worked on a DVD that Neil Young was "peripherally" involved in. "Bernard Shakey's" people told me this:

"Mister Shakey loathes AC3. DTS is acceptable."

I was not surprised to see that Young was attempting to put his entire catalog on Blu-ray to take advantage of 24-bit 192 kHz stereo uncompressed audio. Someone much smarter and more audio-oriented than me can talk about why this is better/worse than CD-Audio, DVD-Audio or Super Audio CD (which I always had a fondness for).
posted by infinitewindow at 1:28 PM on March 6, 2012


empath -- I'm sure there are plenty of waves plugins that have mid-side settings. Izotope Ozone and IK T-Racks, for sure, both have compressors that have mid-side settings. In my experience with my music, the difference is fairly subtle.

Wow that's some serious bullshit.

It's only bullshit actually if you happen to have a speaker that has a physical response anywhere near the sampling frequency. I don't happen to own any speakers where, if I were to use a microphone to record the sound it makes, anyone could detect the sample frequency in a sine wave (or any normal sound). It would detect a smooth curve. The speaker moves smoothly, and my ears move smoothly, and a mic diaphragm moves smoothly.
posted by chimaera at 1:29 PM on March 6, 2012


They make a good point that 24 bit/192 Khz is more than anyone needs, but the key question should be is CD quality (16 bit /44.1KHz) high enough?

I think the point it's making is that CD quality is all that a recording needs. (Although the same isn't true for intermediate products.) If you accept that, then a compromise is unnecessary.

Higher resolutions than 20K can be heard in the phase relationships between two channels, as airy space-filling sound if it's produced with that info present.

The "Listening tests" section of the article claims that the consensus of the literature is that even trained listeners can't tell the difference in properly-run tests. Is it getting something wrong?

Wow that's some serious bullshit.

Why?
posted by Serf at 1:37 PM on March 6, 2012


I was able to play 192kHz/24bit flac files on a gen 5.5 iPod under Rockbox.

It downsamples it. And probably at lower quality than what you would get if you just downsampled your mp3s ahead of time with software.
posted by empath at 1:38 PM on March 6, 2012


It's only bullshit actually if you happen to have a speaker that has a physical response anywhere near the sampling frequency. I don't happen to own any speakers where, if I were to use a microphone to record the sound it makes, anyone could detect the sample frequency in a sine wave (or any normal sound). It would detect a smooth curve. The speaker moves smoothly, and my ears move smoothly, and a mic diaphragm moves smoothly.

Right, the speakers have to move in space, and its motion isn't quantized. It can't go from A to B without passing through all the points in between, it's going to naturally interpolate any missing audio data. And any possible extra data which is missed by the sampling rate is going to be too high a frequency to be audible.
posted by empath at 1:41 PM on March 6, 2012 [4 favorites]


Neil Fucking Young needs to sit his gnarly, grizzled ass down and do a day's worth of ABX testing before he jams his dick any deeper into my ears.
posted by seanmpuckett at 1:46 PM on March 6, 2012 [2 favorites]


Right, but what about hearing "higher resolutions than 20k" due to the phase relationships between the two channels?

The general idea is, suppose you have one little chirp of sound at 15KHZ. You are sitting in a room with various architectural features. The source of that chirp is moved around the room, while you listen with your eyes closed.

You might be able to detect where the sound is coming from, as well as various attributes of the room. Your ears have a special shape to help you do this. The tone that you hear is 15KHZ, but your ears are hearing hundreds of weighted reflections of that sound bouncing off and around the room. Those echoes of the sound arrive at different times, and you might locate the 15KHZ emitter by recognizing time shifts that are less than the period of a 20KHZ or higher sound wave.
posted by StickyCarpet at 1:49 PM on March 6, 2012


sydnius:
All signal content under the Nyquist frequency (half the sampling rate) is captured perfectly and completely by sampling...
Wow that's some serious bullshit.
It's my understanding that it's quite correct, although I didn't understand how it could be until fairly recently. It turns out that reconstructing an analog signal from digital samples is actually rather more complicated than most "how-it-works" grade explanations make out. This, for example, is oversimplified to the point of being plain wrong.
chimaera: The speaker moves smoothly, and my ears move smoothly, and a mic diaphragm moves smoothly.
True enough, but the analog filtering that's required as a part of DAC makes the electrical signal smooth in any case.

There was an uncharacteristically technical, clear, informative and honest article about the way DACs and sampling theory worked in, of all places, Stereophile some years ago. I wish I could find it again.
posted by Western Infidels at 1:50 PM on March 6, 2012 [4 favorites]


... and I say this as a guy who is always, always, always angry about how bad music sounds coming out of my iWhatever because of a noise floor that sounds more like a noise shag carpet, a bullshit bass roll-off and volume settings that make NO SENSE AT ALL god dammit why do I have no way to set a level between pin-drop and HEY MAN!

And yet I still listen to it, because what else am I going to do?
posted by seanmpuckett at 1:54 PM on March 6, 2012


I find Neil Young's voice sounds best in 1 bit, 1 milliHz digital
posted by iotic at 1:58 PM on March 6, 2012


I was also interested in what motivated high-rate digital audio advocacy

It's a certain breed of dude. I don't know why they do it and I don't want to understand. It's something to look down on other people who are already in your goddamn tribe. Perhaps my favorite sentence in the English language is "DO NOT REDISTRIBUTE THIS RECORDING IN A LOSSY FORMAT". Because that's why I downloaded this bootleg sir, not because I am so passionate about a band, but because I wish to flood the non-existent marketplace for live recordings by this jam band with inferior versions of the recording you made.

The whole first world needs someone (not Oasis) to scream "BE HERE NOW" in their ear all day, doubly so at concerts where everyone is so busy taking pictures, posting video and tweeting that the band gets in their fucking way. Why not just enjoy this communal experience? No one gives a shit about the way-too-dark photos you're taking on the iPhone that's obscuring my view.
posted by yerfatma at 2:00 PM on March 6, 2012 [2 favorites]


It downsamples it. And probably at lower quality than what you would get if you just downsampled your mp3s ahead of time with software.

The Spec Sheet says it'll go to 96kHz, so it probably did bring it down. Still not quite an mp3, though.

The microphone will output 192kHz though. Fascinating.
posted by mikelieman at 2:01 PM on March 6, 2012


Is there a waves plugin that does this?

I dunno but you can do it manually by taking the L+R signals and sending them to one buss (that's your mid) and doing the same thing with the R signal phase inverted and sending it to another buss (that's your side). You use the faders of these busses to control their relative levels (or compress, or EQ, or do what you like).

If you sum the outputs of these busses you get your new L signal, and if you invert the phase of one of them and if you invert the phase of the side signal and sum them, you get your new R signal.

Most DAWs (and probably Waves) have a M/S plugin that will do some of this for you but that's the old fashioned way.

M=L+R
S=L-R
L=M+S
R=M-S

That's actually how most 'widening' plugins work, by upping the side component. You can only bump it by 2 or 3dB before it starts to sound weird though.
posted by unSane at 2:01 PM on March 6, 2012 [1 favorite]


He makes a lot of points that I have been quietly thinking to myself for the past few years but didn't know how to articulate. I've done comparisons between 44.1k/16bit & 96k/24bit recordings in the past. I've been able to hear a slight difference. I have trouble saying which one is which when I am blindfolded. I hear a difference, but I can't say if one sounds better than the other. Just different.

When does 24 bit matter?

I'm a hobbyist recording engineer. I used to do it professionally, but the Internet came a calling. I record and mix at 48k/24bit. In the case of production, it is kind of like using photoshop to edit pictures for the web. If my task is to remove stray hairs, Blemishes, wrinkles on clothing and redeye, I would much rather have a high quality source file to work with than a picture that was already saved as a jpg at 72dpi. With the source file there is more information there for photoshop to do its thing.

Audio source files in production are kind of the same thing. If I have to do lots of editing, rendering, and use some realtime time-based (reverbs, delays) or dynamic (compressors, gates) plugins, it's good to give the software as much info as possible to do its thing (although some would argue that the computer is working in the 32bit-64bit realm as far as realtime plugins are concerned so it doesn't matter, never quite understood the argument there). With all those plugins and automation there is a lot of crazy math going on inside of professional audio production software. Also the added dynamic range lets me push the pre amps a bit more without having to worry about peaking and having to ride my faders when recording.

When I am bouncing to a file that will be sent to a mastering house, I bounce to my source format so that the mastering engineers have as much info as possible to work their magic. That's what they want and that is almost always one of the first questions they ask: "What format is it?".

It's up to the mastering engineer to properly use their equipment and experience to squeeze it down to a 44.1k/16bit CD. As long as it is done properly, that's good enough for me. Or a 320 mp3. But there I hear a difference. I think.
posted by chillmost at 2:01 PM on March 6, 2012


iotic: i'd like to hear Neil Young's opinion on 1-bit 2.8MHz sigma-delta modulation.

the basic idea on delta modulation is you reproduce an analog waveform by a sequence of 1-bit values that represent positive or negative errors between prior digital sample and the target analog value. the hardware is cheap, simple and most of all power efficient for long battery life---though it does raise the bitrate by a factor of 4.
posted by dongolier at 2:04 PM on March 6, 2012 [1 favorite]


Audiophiles remind me of homoeopathic medicine. You add a drop of some tincture to water and dilute it until there is not a single molecule of the tincture left.

Before I turned 10, I used to be sensitive to a department store security system that must have used some very near ultrasonic sound. It made me sick to my stomach. My Mum thought that I was insane. I also used to be able to hear light bulbs and vacuum tubes.

Now that I'm over 40, I'm lucky if I notice that the train is passing by 1/4 of a mile away.

All British spellings in this comment are courtesy of auto-correct. Weird.
posted by double block and bleed at 2:04 PM on March 6, 2012 [1 favorite]


16-bit sampling results in a dynamic range of 96dB.
24-bit sampling increases this to 144dB.
The human ear is capable of hearing a dynamic range of about 140dB

At first glance these three facts would suggest that 24-bit sampling is necessary to deliver the best possible sound to your ears. But let's think about what those figures mean. The 140dB range of human hearing covers everything from a barely perceptible whisper to real physical eardrum damage. Do you listen to your music that loud? Maybe, but I'm guessing that if you do then the whispers aren't that important to you.
If you're sitting in the best seat in a concert hall listening to a symphony orchestra then you're hearing a dynamic range of about 80dB...and 16-bit is more than capable of reproducing that.
24-bit is simply more audiophile overkill.
posted by rocket88 at 2:23 PM on March 6, 2012


Actually that has to be done on vinyl masters to stop the needle jumping out of the groove. If a heavy base note throws the needle too far out to one side the record becomes unplayable.

That's not (entirely) how stereophonic records work, though. One channel corresponds to horizontal modulation, the other corresponds to vertical modulation. It's not as if the needle would be "thrown too far out to one side" due to it's stereo placement. It's simply a function of its overall volume.

Granted, placing the bass in the middle has the benefit from requiring less modulation from both channels, while maintaining the overall volume. But the whole thing could be solved by simply mastering the whole thing at a lower volume.
posted by ShutterBun at 2:24 PM on March 6, 2012


The whole first world needs someone (not Oasis) to scream "BE HERE NOW" in their ear all day, doubly so at concerts where everyone is so busy taking pictures, posting video and tweeting that the band gets in their fucking way.

Speaking only about open, non-commercial-home-use-only taping, once the 2nd song starts you've got your levels set ( in theory, if you play the game often, you're dialled into the same location and numbers from last night, but I digress... ) so there's not a whole lot taking you out of the experience.

Cellphone Photographers/Videographers. Please. Stop. You're killing all of us. If I wanted a video, I'd have strapped a video camera to the micstand in the first place. And the #1 RULE OF WHAT HAPPENS AT SHOWS IS: You do not talk about what happens at shows. And video evidence is double stupid.
posted by mikelieman at 2:27 PM on March 6, 2012


One channel corresponds to horizontal modulation, the other corresponds to vertical modulation.

I'm gonna pre-emptively correct myself here. Turns out that, yes, overmodulating one channel of a vinyl recording can indeed skew the needle too far to one side of the groove. Your point is more correct.
posted by ShutterBun at 2:28 PM on March 6, 2012


That sampling theorem, the one that lets you reconstruct a wave losslessly from enough samples? It only works if the highest frequency in the wave is less than 1.5 times the sampling rate. That means it only works when the wave is simple enough for your sample rate. So it'd be trivial to construct a wave that can't be losslessly encoded and decoded at 44Hz. You just wouldn't be able to hear it.
posted by LogicalDash at 2:29 PM on March 6, 2012


1) Steve Jobs was not a particularly analophile from what I've read.

2) If you're trying to write all sciencey, it's best not to cite your youtube videos unless they've undergone peer review.
posted by onesidys at 2:29 PM on March 6, 2012


A few other observations -

- wtf is "practical fidelity"? if the author's point is that "you can't tell the difference", well then s/he just doesn't understand the definition of fidelity. apparently noisy bad recording that you can't discern from a quality one is "just as good". presumably the author would rather hear a good cd than hear a live performance while blindfolded because s/he can't tell the difference.

- that Nyquist rate interpretation is pretty flimsy and ignores "transients". because any samplers can _more_ (not totally) replicate lower frequencies than higher ones, there's a fidelity distortion inherent in such recordings. whether it's perceived or not is simply not the issue, it's less fidelitious.

-author claims that lower frequencies can be "captured perfectly" without any sort of citation.

- the double blind experiments are only valid to the extent to which people could discern differences in the "true" audio image.

- the notion that biaurnal hearing is subjective is both dubious and unsupported. i once sat in a Roger Waters concert where he played a sound and asked everyone to point where the heard it - almost everyone pointed over their right shoulder.

- the ultrasonics issue (as well as others) just seem to iterate that we "can't hear the difference". this has more to do with equipment than "fidelity".

- finally, the truth is that hearing is to a degree "created in your head". imagine turning on a strobe light and gradually adjusting it until you can't see the stop motion - there'd still be some flashing as your brain fills in the blanks. that's what brains do. it's an adaptive response to assist in our survival but doesn't have much to do with the fidelity of a recording.
posted by onesidys at 2:45 PM on March 6, 2012


it's less fidelitious

I'm pretty sure you just invented a word.
posted by lohmannn at 2:48 PM on March 6, 2012


To the contrarian-but-short-on-actual-data audiophiles in this thread: Is there an actual reason why the Nyquist Sampling Theorem isn't "valid" as explained in the article? A non-handwavey explanation with a minimum of made-up words would be preferred.
posted by neckro23 at 2:53 PM on March 6, 2012 [5 favorites]


i was just verbificationing.
posted by onesidys at 2:53 PM on March 6, 2012 [1 favorite]


or adjectifying.
posted by onesidys at 2:54 PM on March 6, 2012


2464-bit is simply more audiophilegeek overkill.
posted by Thorzdad at 2:55 PM on March 6, 2012


Nyquist's theorem is completely valid as the article says - for a single channel.

This just isn't the case when we get into two channels encoded separately, because the difference tones between two inaudible frequencies in the two channels might be very audible.

For example, there's a technique called acoustic heterodyning, where you have two carrier soundwaves, each of which has all its energy in the megahertz range (which makes them extremely directional). You can't hear either carrier, but if the two intersect at or near your ear, you hear the difference between these two waves, which is firmly in the audio range.
posted by lupus_yonderboy at 2:59 PM on March 6, 2012 [1 favorite]


Comparing a CD to a live performance is as unfair as comparing a video of play to viewing the play in person.

There are a million "quality bottlenecks" between a live performance and listening to a recording of that performance. The author makes a pretty compelling argument that if one of these bottlenecks exists beyond the range of the human brain to discern, why bother with it? Simply because we can?

Arguing about "fidelity" is great as an abstract concept, but the very premise of audio recordings is a lie. Recordings are 100% subjective, from the microphone placement to the final mix. The best argument in favor of fidelity is that one is trying to reproduce what the engineers heard in the recording booth, but even that is totally impractical.
posted by ShutterBun at 3:05 PM on March 6, 2012


Re Nyquist - not to get into a whole thing here, but.... zeno's paradox illustrates how any mathematically system of using "squares" to approximate "curves" _always_ has a loss of fidelity. it simply is not "captured perfectly".
posted by onesidys at 3:08 PM on March 6, 2012


And one more point.

There's an old proverb in physics and engineering that says, "Measure with a micrometer. Mark with chalk. Cut with an axe."

When I record audio for production purposes, I always record it at 24-bit. Always, always, always. The reason is that digital overs are bad, sometimes really really bad, and you want to leave enough headroom to make them completely impossible.

I did a whole album in 16-bit, back in the day, and we were constantly tweaking the gain to make sure we were getting enough signal but no overs - and there were fuckups.

I checked, and I'm recording in 44.1K. I'm probably going to keep it that way for now. The rest of my gear just isn't that good, and doubling the sample rate more or less doubles the load on your system - I use a lot of CPU already.

I have (or at least had) "golden ears". And what am I listening to right now? A 192kbit mp3 on some Genelec near-field monitors. I'm sure the audiophiles are in contempt but I love it.

(OK, if you hear my radio station you'll hear 128 bit. So sue me...)
posted by lupus_yonderboy at 3:09 PM on March 6, 2012 [1 favorite]


I stopped reading this well written tripe after the following statement in the piece:

You stopped reading when they restated the Nyquist-Shannon sampling theorem because it is tripe? When are you going to publish your paper that revolutionizes information theory?

zeno's paradox illustrates how any mathematically system of using "squares" to approximate "curves" _always_ has a loss

There are no squares in Nyquist's theorem, only in poor explanations of it online.
posted by markr at 3:14 PM on March 6, 2012 [9 favorites]


... you might locate the 15KHZ emitter by recognizing time shifts that are less than the period of a 20KHZ or higher sound wave.

Dude, the wavelength of sound at 15KHz is less than 1 inch. Your ears are about 6 inches apart. It is physically impossible to discriminate the phase difference between a sound wave reaching your two ears. There are about 12 cycles of different sound arriving at your left and right ears. Another way of saying it is that you can't detect any phase shift less than 1200%, which has nothing to do with the sample rate.

At high frequencies, the main way you determine direction is the fact that the ear toward the sound is louder than the ear shadowed from the sound on the other side of your head. Everyone is familiar with Dolby theater sound where you hear the approaching car from left to right. That is just higher volume on the left followed by higher volume on the right. Again, this has nothing to do with phase difference.

Phase differences between signals arriving simultaneously at different ears are what allow us to spatially locate sounds and construct a stereo image. A 40Hz signal arriving 10 milliseconds later at the left ear than the right ear doesn't have much of a phase difference, but a 20 kHz signal arriving 10 ms later has a HUGE phase difference.

This is exactly backwards. You can't detect phase differences of 20 KHz sound because the wavelength is only a fraction of the distance between you ears, as described above. Below about 1000 Hz you can begin to detect phase difference because the wavelength is about twice the distance between your ears. Below about 80 Hz the phase shift is so small as to be undetectable. So phase detection only occurs for a small range of mid to low frequencies from about 1000 Hz to 100 Hz.

What you might be referring to is group delay. That is just a loud sound arriving first at your right ear and slightly later at your right ear, but again that has nothing to do with phase or sampling.
posted by JackFlash at 3:16 PM on March 6, 2012 [3 favorites]


Btw, if you're interested in hair-splitting discussions, I strongly recommend reading Douglas Hofstedder's Godel, Escher & Bach. Each chapter has an amusing introduction such as the one where a hare builds a record that destroys record players, followed by the tortoise building a record player that detects record player destroying records and so forth...
posted by onesidys at 3:17 PM on March 6, 2012


I found an article which contains only Young and Jobs parts here.

What I get from reading that is their main concern is compression (like mp3), not sampling rate and number bits. FPP article looked informative on one reading and I will definitely go back and read it closely.

> the key question should be is CD quality (16 bit /44.1KHz) high enough?

The article answers that question with a resounding and detailed "yes."


That is the author's opinion. There are a lot of interesting opinions in this debate which does not appear to me to be settled.

I like the Rupert Neve argument.
posted by bukvich at 3:17 PM on March 6, 2012 [1 favorite]


zeno's paradox illustrates how any mathematically system of using "squares" to approximate "curves" _always_ has a loss

There are no squares in Nyquist's theorem, only in poor explanations of it online.

earlier someone had asked for a simple explanation on it, so i replied..
posted by onesidys at 3:18 PM on March 6, 2012


Re Nyquist - not to get into a whole thing here, but.... zeno's paradox illustrates how any mathematically system of using "squares" to approximate "curves" _always_ has a loss of fidelity. it simply is not "captured perfectly".

I think your analogy is misguided - Zeno's paradox also illustrates how it's impossible to ever move between two points. But to get to the meat of your comment, your sampled audio is not a bunch of squares. It's a bunch of points, and there is nothing in between them. Not squares, not curves, not squiggly little lines going up and down. The "Nearest neighbor" interpolation that is often shown in crummy explanations will introduce a bunch of artifacts, yes. But that's not what the electrical or mechanical parts of the playback process do.
posted by aubilenon at 3:20 PM on March 6, 2012 [2 favorites]


again - a simplification. imagine trying to make a sine wave constructed of squares. what you get is a zig-zag line. if you fill it in with smaller blocks, you get smaller gaps but gaps nonetheless. repeat ad nauseum and it's analogous to zeno's paradox. sheesh.
posted by onesidys at 3:23 PM on March 6, 2012


-author claims that lower frequencies can be "captured perfectly" without any sort of citation.

This is nothing more than a restatement of the Nyquist theorem, which the author does mention, and which is the foundation of all signal processing theory.
posted by Mars Saxman at 3:24 PM on March 6, 2012


a _sampling_ of points, yes.
posted by onesidys at 3:25 PM on March 6, 2012


onesidys, i think you're way out of your depth here. Take a course on information theory and come back to this later, okay?
posted by empath at 3:28 PM on March 6, 2012 [3 favorites]


onesidys, it's actually better to think about reconstructing the signal by using sinc functions centered on the samples. This is in fact the reconstruction that Shannon describes in his 1949 paper, which also contains a proof that a uniform sampling of a signal containing only frequencies below half the sampling rate is all the information that is required to perfectly reconstruct the signal.
posted by Serf at 3:30 PM on March 6, 2012 [4 favorites]


analophile

The Rick Santorum thread's over there.
posted by acb at 3:31 PM on March 6, 2012


empath - what qualifications of "information theory" do you require? and since you're spouting, please provide your qualifications.
posted by onesidys at 3:52 PM on March 6, 2012


zeno's paradox illustrates how any mathematically system of using "squares" to approximate "curves" _always_ has a loss of fidelity.

Have you ever taken calculus?

Zeno's Paradox merely demonstrates that you have to travel an infinite number of measurable distances to reach the other side of the room or whatever. Not that you have to travel an infinite distance.

Using calculus to integrate over an infinite number of distances gets you the actual distance involved. Math can do that kind of thing. You can even use rectangles for it if you want, so long as you're comfortable with infinitely narrow rectangles. Riemann Sums work that way.
posted by LogicalDash at 3:57 PM on March 6, 2012 [1 favorite]


One of the drawbacks of recordings is their static nature. One performance does -not-fit all, and familiarity breeds fatigue.

If you want to do listeners a favor, instead of ultra-micro fidelity, release the separate stems (or better still tracks, along with any MIDI and automation information) along with the mix on the disks they purchase.

Provide software that lets (most) listeners select modified track levels, some FX, load substitute samples, control tempo ... in essence, to an extent, custom-modify and re-effect their favorite tracks.

Composers may choose to integrate some of these options into releases of their works as randomly- or algo-selected mutations. One work becomes an infinite multitude of works; but people can listen to their favorites by bookmarking the right random 'seed'.

Every day the river is a different river. Why not recorded music?
posted by Twang at 3:57 PM on March 6, 2012 [2 favorites]


Perfecting Sound Forever: An Aural History of Recorded Music is a great book on this topic.

I think it even mentions Neil Young.
posted by monospace at 4:02 PM on March 6, 2012 [3 favorites]


Because most people just want to kick back and listen to some tunes with _MIND_ALTERING_SUBSTANCE_OF_CHOICE_
posted by unSane at 4:02 PM on March 6, 2012


Well, his "Graphic" depicting the response curves of the cones and rods in your eyes is way off Given how easy it is to find that information I'm a little skeptical about how accurate the rest of his information is.
No one can see X-rays (or infrared, or ultraviolet, or microwaves). It doesn't matter how much a person believes he can. Retinas simply don't have the sensory hardware.
That's not true either, you can actually see infrared faintly if you filter out the visible spectrum. And it definitely is true that the RGB color space we currently use does not cover the entire color space that humans can see.

Plus, there are Tetrachromates: People who have four types of color sensors rather then three. These are women who have an 'incorrect' blue receptor on one of their x-chromosomes.

Anyway, since his stuff about vision is so bad (based on what I know about vision) I'm guess his stuff on audio is bad as well.

Remember, this is just a blog post. Audiophiles are prone to pseudoscience, but there is nothing preventing an "Audiophile skeptic" from being just as wrong.
posted by delmoi at 4:03 PM on March 6, 2012


onesidys, here is another way to thing about it.

You draw an arbitrary waveform of any shape on a piece of paper. Then you place a series of dots on the waveform spaced at the Nyquist interval (corresponding to the maximum bandwidth of the waveform). Then you erase all of the lines between the dots. Next I walk into the room and based on your dots, I can reconstruct the original waveform perfectly. Your dots contain 100% of the information that was contained in the original waveform. That is the essence of the Nyquist theorem -- those few dots contain 100% (all) of the original data -- believe it or not. There is no extra information to be had by adding more dots.
posted by JackFlash at 4:05 PM on March 6, 2012 [2 favorites]


Calculus! oo--ooo. yes. did you read my earlier comment about zeno as an _abstraction_?

Behold wikipedia -
The theorem assumes an idealization of any real-world situation, as it only applies to signals that are sampled for infinite time; any time-limited x(t) cannot be perfectly bandlimited. Perfect reconstruction is mathematically possible for the idealized model but only an approximation for real-world signals and sampling techniques, albeit in practice often a very good one.

The theorem also leads to a formula for reconstruction of the original signal. The constructive proof of the theorem leads to an understanding of the aliasing that can occur when a sampling system does not satisfy the conditions of the theorem.

The sampling theorem provides a sufficient condition, but not a necessary one, for perfect reconstruction. The field of compressed sensing provides a stricter sampling condition when the underlying signal is known to be sparse. Compressed sensing specifically yields a sub-Nyquist sampling criterion.
posted by onesidys at 4:08 PM on March 6, 2012


Anyway, since his stuff about vision is so bad (based on what I know about vision) I'm guess his stuff on audio is bad as well.

Remember, this is just a blog post. Audiophiles are prone to pseudoscience, but there is nothing preventing an "Audiophile skeptic" from being just as wrong.


It's a blog post by Chris Montgomery, creator of Ogg Vorbis. His background is audio coding, not vision, I'm willing to cut him some slack with vision, but not audio. The audio portion is legit.
posted by zsazsa at 4:12 PM on March 6, 2012 [2 favorites]


thanks for the _constructive_ comment, jackflash - can't be said for others here.

gilrain - you could use two points to describe perfectly. or imperfectly.
posted by onesidys at 4:12 PM on March 6, 2012


The reason artists don't release the stems is not because the audience lacks the tools: it's because the audience can't be trusted to make any decisions that would do the recorded material any justice. That's why they're the audience, and that's why we listen to our favorite artists to begin with.
posted by monospace at 4:22 PM on March 6, 2012


Behold wikipedia -

Yes, that the same article that others asked you to read. When they are talking about a system which doesn't meet the criteria, they are specifically talking about one in which there frequencies greater than B hertz. If you set B so that it's at the upper range of human hearing, then it's irrelevant. If B is lower than that, then you will get aliasing.


As far as needing a sample of infinite time, I believe that has to do with the fact that a wave that exists for a finite length of time cannot be said to have a definite frequency. But let me assure you that your ear and brain has the same problem, and it has absolutely nothing to do with digitization or sampling. It has to do with the mathematics of waves.
posted by empath at 4:26 PM on March 6, 2012 [2 favorites]


The reason artists don't release the stems is not because the audience lacks the tools: it's because the audience can't be trusted to make any decisions that would do the recorded material any justice.

Hah, no. It's because they want to be able to collect royalties on remixes, and also because it's a lot of work to prepare stems, to very little benefit.
posted by empath at 4:27 PM on March 6, 2012 [1 favorite]


Words like 'perfect' should never be used in discussions like this. Nothing is perfect. Every analog to digital, digital to analog, or analog to analog transfer adds noise. Every cable, every amplifier, every transistor or vacuum tube.
Yes, there is a small error from digital sampling, but that error is typically a tiny fraction of the errors introduced at every other link in the chain between the studio microphone and your ears. It's buried in the noise, and nobody's 'golden ears' can hear it.
posted by rocket88 at 4:36 PM on March 6, 2012


I don't want perfect sound. i just want perfect songs.
posted by freakazoid at 4:36 PM on March 6, 2012 [1 favorite]


That's not true either, you can actually see infrared faintly if you filter out the visible spectrum.

Um, no. Once again: stand in a pitch black room with an infrared remote. (no need to filter out the visible spectrum) Point it directly at your eyes and press a button.

Nada.

Do the same thing with (real) infrared goggles on, and you'll see that they are actually putting out a shitload of (invisible) light.
posted by ShutterBun at 4:37 PM on March 6, 2012 [1 favorite]


i think that the infinite time is necessary to obtain the proof, but feel free to prove me wrong (personally, the smarmy ranting has begun to bore me).
posted by onesidys at 4:38 PM on March 6, 2012


For example, there's a technique called acoustic heterodyning, where you have two carrier soundwaves, each of which has all its energy in the megahertz range (which makes them extremely directional). You can't hear either carrier, but if the two intersect at or near your ear, you hear the difference between these two waves, which is firmly in the audio range.

If heterodyning occurs in the original source sound, the downshifted audio beat frequency will be picked up in the original recording and therefore also there when played back. You don't need the high frequency heterodyne for playback of the audio signal. What was audible during recording is also audible during playback. Conversely, what wasn't audible during recording, won't be audible during playback. There is nothing gained by recording ultrasonics.

If you are talking about recording two completely separate high frequency signals and then combining them only on playback, as in your example, then you are creating an undesirable audible artifact that was not present in the original.
posted by JackFlash at 4:42 PM on March 6, 2012 [2 favorites]


Well, his "Graphic" depicting the response curves of the cones and rods in your eyes is way off

Here's another view of the human eye's color response. They look reasonably similar to me.
posted by ShutterBun at 4:47 PM on March 6, 2012


i think that the infinite time is necessary to obtain the proof, but feel free to prove me wrong (personally, the smarmy ranting has begun to bore me).

You're making nonsensical arguments against well-established mathematical facts. If you want to learn something, that's great, but that's not the way you're approaching this thread.
posted by empath at 4:47 PM on March 6, 2012 [3 favorites]


ShutterBun: Do the same thing with (real) infrared goggles on, and you'll see that they are actually putting out a shitload of (invisible) light.

Or aim it at a digital camera. CCDs see infrared, and the remote's IR LED will show white on the camera's screen.
posted by Westringia F. at 4:59 PM on March 6, 2012


another vote here for 44.1/24 bit - i record mostly in 24 or 32 bit, depending on what program i'm using - as the article and a couple of other people have mentioned, it's headroom - digital meters tend to lie and average out things so you might think you're under 0 db, but brief transients might pop over 0 db, causing distortion, which will add up in a mix

it's really difficult in 16 bit to record so you don't get those distorting transients but still have a signal far enough above the noise floor

in 24 bit, it's much easier - i try not to record anything hotter than -10db - although sometimes i creep up to -5 or -6 - and i'm far enough above the noise floor to get a good recording

most of my mixes are somewhere around -15 - and then i master them up to -1 or -2 - and then, and only then, do i move to 16bit to burn a cd

it just plain sounds better engineered that way
posted by pyramid termite at 5:03 PM on March 6, 2012 [1 favorite]


Once again: stand in a pitch black room with an infrared remote. (no need to filter out the visible spectrum) Point it directly at your eyes and press a button.

This is 100% true.

To give delmoi a bit of the benefit of the doubt, though, there *are* some IR LEDs which are used (specifically ones that I've seen with night-vision security cameras), that have a very slight bleed into visible red end of the spectrum at night or in a dark room. But that doesn't mean I'm seeing infrared light, just the reddest tail end of that particular light's spectrum.
posted by chimaera at 5:04 PM on March 6, 2012


- finally, the truth is that hearing is to a degree "created in your head". imagine turning on a strobe light and gradually adjusting it until you can't see the stop motion - there'd still be some flashing as your brain fills in the blanks. that's what brains do. it's an adaptive response to assist in our survival but doesn't have much to do with the fidelity of a recording.
posted by onesidys


This is a very good point. Your brain can reconstruct a signal if it has to, and you don't even know that you're doing it, in fact you can't not do it. But after a while it's fatiguing, all that processing is sucking up nutrients. So the higher fidelity might not sound better, but it might feel better.
posted by StickyCarpet at 5:16 PM on March 6, 2012


Dude, the wavelength of sound at 15KHz is less than 1 inch. Your ears are about 6 inches apart. It is physically impossible to discriminate the phase difference between a sound wave reaching your two ears. ...
posted by JackFlash


We were talking about a 15KHz sound emitter in an actual room. There would be many different frequencies reaching the ear according to the resonance of the space. As a group this broad spectrum of frequencies would have time relationships that the ear can detect, and the mind can use to make an image of the space, and of the emitter.
posted by StickyCarpet at 5:37 PM on March 6, 2012 [1 favorite]


I think if we want to improve the overall quality of recorded sound what we really need to do is files with separate tracks so users can change the levels and EQ of individual channels to suit their own preferences. Maybe boost the levels on the vocals, or whatever. There was a cool web-release from Arcade Fire or someone where you could watch a video and mute and unmute all the layers of the song. The song ended up taking on a completely different character depending on what tracks you played.

Rather then trying to find a 'perfect' playback ideal, we should be letting users tune songs to sound best on their own equipment.
To the contrarian-but-short-on-actual-data audiophiles in this thread: Is there an actual reason why the Nyquist Sampling Theorem isn't "valid" as explained in the article? A non-handwavey explanation with a minimum of made-up words would be preferred.
Let's be clear here: This guy is a contrarian audiophile (he said he could tell by ear which MP3 encoder was used). With a blog. There is no reason to suspect he's not a crank, and indeed some of the stuff he's saying seems to be wrong (as I said, his thing about vision, for example, was way off)

Just because you can't hear some data doesn't mean you shouldn't include it anyway. If you do, then if you make changes later you keep. Maybe if you're really careful, you can get everything you need in 44.1khz/16 bit but you're much more likely to screw it up.

I'm not at all an audiophile, by the way. I'm just trying to figure out if what he's saying is correct and it doesn't seem correct to me
It's only bullshit actually if you happen to have a speaker that has a physical response anywhere near the sampling frequency. I don't happen to own any speakers where
No no no. You only have to have speakers that have a response near half the sampling rate. Analog speakers that can play an audio tone at 20khz audio can also play a 19khz tone. But the encoding wouldn't be able to include a 19khz tone properly, I don't think.

Anyway, regarding this guy's argument: Here's the other thing: Don't people use 24/192 to record sound? Obviously if you're working with audio, you want higher sample rates on things so that any distortions or changes you make don't reduce the bit rates (i.e. slowing down a 44khz sample 2x would halve the sample rate)

And it's worse then that: If you slow a 44khz sample down by 10%, you end up with 40khz. Which doesn't match at all. Wouldn't you end up with some kind of interference?

The only thing I can think of is that 192 isn't an even multiple of 44.1khz, so if you try to play it on a 44.1khz DAC maybe it will sound messed up. But it wouldn't be difficult to include 48khz playback capability as well, I think old soundblasters could do 96khz anyway. If you did that, you could use a 4x down sample without any interference that would need to be filtered out.
I don't know if 44.1KHz/16bit is perfect -- but it's really close. Really, the only thing it is lacking is for the .00001% that can hear over 22KHz (not that anyone's trying hard to record that, mind you...) and it is has slightly less dynamic range than ideal human hearing. I suspect 48KHz/17bit would more than cover the entire theoretical human audio spectrum, but with most processing systems, this would require packing the sample into a 24 or 32 bit word, which would be a big waste.
The thing is, though: If you're at 44.1khz sampling: wouldn't a 20khz signal 'sound' exactly the same as a 21khz sample? My guess is that the human ear just hears sound in those ranges as clicks anyway, rather then being able to distinguish them. But if not, 44khz would damage the relationships between notes at really high frequencies.
Pro music producers.
This is a lot like people demanding 4K video DVDs in a world where only 720p TVs existed.


Well, not exactly. The visual defects in 1080p are still really obvious (as are compression artifacts, if you know what to look for) while the sound stuff pretty much sounds perfect to most people.
That's not (entirely) how stereophonic records work, though. One channel corresponds to horizontal modulation, the other corresponds to vertical modulation. It's not as if the needle would be "thrown too far out to one side" due to it's stereo placement. It's simply a function of its overall volume.
Nope, one channel is vertical, and the difference between that channel and the other is horizontal. So if something is in the center, it won't affect the horizontal grove.

--
onesidys, i think you're way out of your depth here. Take a course on information theory and come back to this later, okay?
I've taken classes on information theory, but relating to digital information, not analog signals. I really have no idea why you would say that a 44.1khz signal would perfectly capture the difference between a 22.05khz signal and a 19khz signal. I mean, if you were working with images, it would be like having a 1-pixel wide grid pattern, and stretching it out to be a 1.16 pixel wide grid pattern. It wouldn't work at all. Maybe there are algorithms you can use to prevent weird artifacts from showing up, but in order to do that you would need to make it look just like the 1-pixel wide grid, with maybe some kind of minimal artifact every 8 pixels or so.

Now, it may very well be that you can produce a signal that will perfectly trigger a frequency detector to spit out the right 'number', but that's not the same thing as producing the exact same signal, but it wouldn't look the same.

Now, it may be that our ears work like that anyway as far as high frequencies are concerned. But would you really say that if you had, say 440hz sample rate that you'd be able to distinguish between a A3 and a G#? I kind of doubt it.

Maybe if you ran it through a computer, you could determine that the original signal had been a G#, but it wouldn't sound like a G#. It would sound like a normal A note, with some kind of interference pattern.
Using calculus to integrate over an infinite number of distances gets you the actual distance involved. Math can do that kind of thing. You can even use rectangles for it if you want, so long as you're comfortable with infinitely narrow rectangles. Riemann Sums work that way.
Right... but the point is that digital systems are not doing calculus when it comes to reconstructing these signals, they are approximating it using a fixed number of segments per second (you don't need to use squares, and apparently they are using wavelets, but whatever – squares, polygons, wavelets, whatever approximation method you use, you still

From the wikipedia article on the nyquist theorem:
The theorem assumes an idealization of any real-world situation, as it only applies to signals that are sampled for infinite time; any time-limited x(t) cannot be perfectly bandlimited. Perfect reconstruction is mathematically possible for the idealized model but only an approximation for real-world signals and sampling techniques, albeit in practice often a very good one.
Which is exactly what I was thinking. If you have an infinitely long sample, you could analyze the output and figure out exactly what the original signal was by simply counting the peaks. But if you have a short amount of time (like a tenths of seconds) you wouldn't be able to reconstruct the signal perfectly. And in music, you're only listening for the duration of a note.
You draw an arbitrary waveform of any shape on a piece of paper. Then you place a series of dots on the waveform spaced at the Nyquist interval (corresponding to the maximum bandwidth of the waveform). Then you erase all of the lines between the dots. Next I walk into the room and based on your dots, I can reconstruct the original waveform perfectly. Your dots contain 100% of the information that was contained in the original waveform. That is the essence of the Nyquist theorem -- those few dots contain 100% (all) of the original data -- believe it or not. There is no extra information to be had by adding more dots.
No, you need an infinite series of dots for this to work. Think about it. Lets say you have a 10 inch long sheet of paper, and you're using one inch as your sample. According to you, you should be able to determine the difference between a 2-inch wavelength, and a 1.999 inch wavelength. How is that even remotely possible? Obviously it's not. If you had an infinitely long sheet of paper, you could. But you only have ten samples here.

It sounds like a lot of you are confusing the fact that the theorem is true for an infinitely long, unchanging signal as meaning it is also true for a short duration. But for a short duration, it's only an approximation.

---
Here's another view of the human eye's color response. They look reasonably similar to me.
I'm not sure where that one comes from. (Note that it's also normalized) The one I linked too was the CIE 1931 RGB Color matching function which is the international standard based on experiments done in the 1920s (which still hold up, as far as I know). Notice, importantly how the red sensors also have a second peak at blue (which is why violet looks reddish)
posted by delmoi at 5:43 PM on March 6, 2012


This article is totally wrong.
When you are sampling at about twice the frequency of the fastest signal, yes you are above the Nyquist limit, but you are only getting two samples per cycle. Two samples is not enough to describe any waveform other than a square wave. It's also still close enough to the underlying frequency that you are going to get a lot of aliasing (i.e. not accurately describe even the pitch of high pitched waveform). You need to be further above the frequency of the highest signal to accurately describe the waveform and be able to recreate it in a way that sounds natural.

16 bits is also not enough depth. Quiet passages use less of the bit depth and so are more quantized, which has lead to producers and mastering engineers compressing the mix more to make quiet bits louder, which has become something of an arms race. We have the technology now to easily use 96KHz 24 bit and enter a new era of sound where the aliasing and quantization is imperceptible. It doesn't even have to be expensive - a DAC chip costs a few cents whether it does 44.1 or 96, 16 bit or 24. Storage has also gotten very cheap.

You don't need special new speakers to appreciate the difference - in fact it should be quite noticeable on an iPhone through the little earbuds.
posted by w0mbat at 5:43 PM on March 6, 2012 [1 favorite]


Yes, and if it were recorded in a room, then those audible frequencies would be captured by the microphone and reproduced by the speaker. If it wasn't in the original sound, and you're just talking about reflections from your room, then those are artifacts, and not an accurate representation of what was recorded, and if you're listening in headphones, it's not even a consideration.
posted by empath at 5:45 PM on March 6, 2012


You don't need special new speakers to appreciate the difference - in fact it should be quite noticeable on an iPhone through the little earbuds.

The iPhone downsamples.
posted by empath at 5:46 PM on March 6, 2012


This article is totally wrong.
When you are sampling at about twice the frequency of the fastest signal, yes you are above the Nyquist limit, but you are only getting two samples per cycle. Two samples is not enough to describe any waveform other than a square wave.


But all of the information that is not described is above the Nyquist limit. It's upper harmonics. So, sure if the Nyquist limit is 20 kHz then you're getting very limited informatioj about the 20 kHz waverform, but the information that's lost consists entirely of harmonics whose frequencies are above 20 kHz, above the stipulated threshold.

This is why you use an anti-aliasing (low-pass) filter before the conversion, so that only sub Nyquist frequencies are represented in the input waveform.
posted by unSane at 5:52 PM on March 6, 2012 [5 favorites]


The theorem also leads to a formula for reconstruction of the original signal. The constructive proof of the theorem leads to an understanding of the aliasing that can occur when a sampling system does not satisfy the conditions of the theorem.

The issue is that there are no perfect brick wall filters that can cut off the audio at exactly 20 KHz as required by Nyquist. That is the reason that the audio is over-sampled at either 44.1 KHz or 48 KHz. This permits using 20 KHz filters with a transition band of 4 KHz or 8 KHz, which is more reasonably obtained. Oversampling at 192 KHz provides no real improvement if you have good filters.
posted by JackFlash at 5:55 PM on March 6, 2012 [2 favorites]


Anyway, let me expand on why some of you are wrong about the nyquist limit thing.

Imagine you have a roll of paper 100m long, and it has a sine-wave with a 1.0001 cm wavelength. You take another roll, equally long, and draw dots where the point is every 5mm.

If you take that second roll of paper, and count the number of dots, you can see that there were originally 9999 peaks on the original roll, and that the wavelength of the original signal was 1.0001cm.

However, if you only have an ordinary sheet of paper, you would never be able to tell that the original wavelength had been 1.0001cm rather then 1cm.

Now, lets say that we still had a 100m roll of paper, and our input wave was 1.01cm instead of 1cm. In that case, if you did try to reconstruct the sound by drawing a line through all the points, you would end up with a sheet that showed a 1cm wave, which would appear to be modulated by a 1m wave (i.e. the volume would go up and down). If you tried to play that back, it would be wrong. Obviously, you could probably run some algorithm to fix it, but would you know that you were supposed too?
But all of the information that is not described is above the Nyquist limit. It's upper harmonics. So, sure if the Nyquist limit is 20 kHz then you're getting very limited informatioj about the 20 kHz waverform, but the information that's lost consists entirely of harmonics whose frequencies are above 20 kHz, above the stipulated threshold.
As I am trying to point out, the Nyquist limit is only an approximation if you don't have an infinitely long sample of a frequency. If you have a short duration, it's only an approximation.

Also, there is a difference between having the same information present and sounding the same. If you took an digital audio recording and replaced the left channel with a copy of itself XORed with the right channel, all the information would still be there, but it would obviously not sound the same.
posted by delmoi at 6:07 PM on March 6, 2012


As I am trying to point out, the Nyquist limit is only an approximation if you don't have an infinitely long sample of a frequency. If you have a short duration, it's only an approximation.

This is not just a limitation of sampling. Your ear has the same limitation. It's the same limitation that is the cause of time/energy uncertainty in physics.
posted by empath at 6:17 PM on March 6, 2012 [1 favorite]


He authored Vorbis. Also, he clarifies that he could only tell MP3 encoders apart a long time ago, when most were still pretty bad and had individual quirks... and keep in mind, he was designing their competitor, so you can assume he was more familiar with them than almost anyone.

I don't think authoring popular open source software is a guarantee of non-crankness. :)
posted by delmoi at 6:18 PM on March 6, 2012 [2 favorites]


Does your edge case have any application in the real world of recorded music, delmoi? The point is that the Nyquist limit in 44.1 or 48 kHz samplied recordings is at or beyond the threshold of most human's hearing. So the amplitude modulation alisasing you are describing is vanishingly unlikely to actually be audible in any real world scenarios, and in any case is likely to be vastly swamped by the multifold phase distortions introduced by most amplifiers and tone stacks, regardless of whether the input source is analog or digital.
posted by unSane at 6:19 PM on March 6, 2012


Guys, it's a lot of fun swinging your dicks around until you realize that you have to put them in your ears at some point, and then you realize that dicks don't fit in ears. Which is a silly way of saying that there are far more important places to optimize the audio chain than pushing beyond 44k/16.

1. Ear hygiene and health
2. Sound level of listening environment, including isolation
3. Quality of transducers, including frequency response and distortion
4. Acoustic coupling method of transducers to ear
5. Final amplifier, including distortion and noise floor and impedance match
6. Mixer and pre-amplifier and D-A conversion, including jitter, filtering and ground isolation
7. Decompression algorithm quality and determinism
8. Content compression level, algorithm and algorithmic quality

And that's just on playback. There's a whole shitpile of other things to worry about on the creation side.

So, as far as I'm concerned, until I can plug my etymotics into a device while listening in a reasonably quiet room and NOT HEAR ANY NOISE AT ALL when the audio chain is active, there's far more important work to be done than jacking off all over Shannon's proof. But the device that can do that doesn't exist, as far as I can tell, and so pretty much what I'm looking for at this point is maybe support for FLAC or perhaps 512k split stereo AAC encoding.

Oh, and to kill iTunes dead.
posted by seanmpuckett at 6:27 PM on March 6, 2012 [4 favorites]


This is not just a limitation of sampling. Your ear has the same limitation. It's the same limitation that is the cause of time/energy uncertainty in physics.
True, the ear has limits as well. But they're based on a different mechanism (having discrete receptors for various frequencies) So the question is whether or not 44khz 16bit is precise enough that any errors would fall below the 'noise floor' of the ear itself.

The other problem, of course is that people will mess things up -- even if you can theoretically stuff all the information in that range, giving people more headroom would prevent people from screwing it up. You could use the extra bits to simulate 'analog' clipping rather then digital clipping if the signal goes over a certain threshold.
Does your edge case have any application in the real world of recorded music, delmoi? The point is that the Nyquist limit in 44.1 or 48 kHz samplied recordings is at or beyond the threshold of most human's hearing. So the amplitude modulation alisasing you are describing is vanishingly unlikely to actually be audible in any real world scenarios
Maybe, maybe not. I really have no idea. I know if it would be a huge issue if you were talking about an image rather then an audio signal. But the point is you can't just say 'well the nyquist limit proves it's a non issue"
posted by delmoi at 6:28 PM on March 6, 2012


It's probably worth considering that unSane, TheWorldFamous (and me to a much lesser extent) have probably spent many, many, many hours between us engineering synths and staring at wave forms trying to get barely perceptible improvements in sound, so I think we might know a little about what we're talking about. We're not exactly just talking out of our asses here.
posted by empath at 6:31 PM on March 6, 2012 [1 favorite]


Delmoi, the wavelength would be just as noticeable on the first 11" of paper as on the whole 100m. Digital audio systems are actually using the position, not just counting dots. You're not quite at the level of the wholesale dismissal of calculus above, but....
posted by flaterik at 6:34 PM on March 6, 2012


But empath, COMMON SENSE SAYS RECTANGLES CAN'T REPRODUCE CURVES!! And science and math are never counter-intuitive. Not ever.

*sigh*
posted by flaterik at 6:36 PM on March 6, 2012


I would like to congratulate a few people in this thread for their upcoming Nobel Prizes for Mathemetics.
posted by the duck by the oboe at 6:38 PM on March 6, 2012 [3 favorites]


The other thing is that the listening environment has a FAR FAR greater effect on the sound you hear than almost any other factor in your audio chain. Recording studios have bass traps and baffles and non-parallel walls for very good reasons. The average listening environment has parallel walls and a lot of reflective surfaces (windows etc) which add up to a nighmarish pot-pourri of comb-filtering and constructive/destructive interference (ie peaks and troughs at certain frequencies), phase cancellation and other stuff which is MASSIVELY more audible than any of the issues we have discussed here.
posted by unSane at 6:38 PM on March 6, 2012 [3 favorites]


(confession: I can't even tell the difference between a song being played natively in my DAW at 24bit and a 128kbps rendered mp3 for a song that I produced from scratch)
posted by empath at 6:38 PM on March 6, 2012


COMMON SENSE SAYS RECTANGLES CAN'T REPRODUCE CURVES

I wish I were able to read this, but the letters on my screen are only an approximation of curves and lines made from square pixels and thus are indeciperable.
posted by rocket88 at 6:41 PM on March 6, 2012 [9 favorites]


I think if we want to improve the overall quality of recorded sound what we really need to do is files with separate tracks so users can change the levels and EQ of individual channels to suit their own preferences. Maybe boost the levels on the vocals, or whatever. There was a cool web-release from Arcade Fire or someone where you could watch a video and mute and unmute all the layers of the song. The song ended up taking on a completely different character depending on what tracks you played.

Rather then trying to find a 'perfect' playback ideal, we should be letting users tune songs to sound best on their own equipment.


Why stop there? Why not offer different tracks with the instruments recorded with different microphones? Surely some listeners will prefer their snare mic'd with 421 instead of an SM57. And why not offer the tracks played in different rooms? By different artists?
posted by the duck by the oboe at 6:41 PM on March 6, 2012 [1 favorite]


In fact, why not just provide sheet music and let them play it themselves?

In fact, why bother with sheet music? Let the fuckers make it up!!
posted by unSane at 6:43 PM on March 6, 2012 [1 favorite]


Actually, if I was to have a pony audio format, it would probably be first order Ambisonic with each of the four channels at 192kbps. But that would be absolutely insane CPU usage on the decoding side, and would require a whole new workflow at the head-end. But with that data you have a 3D spatial recording of sound pressure levels at a point in space and it could be mapped on playback to any arbitrary number of transducers at arbitrary points in space adjusted for frequency response and distance to listener.... It's a nice pony. Sigh....
posted by seanmpuckett at 6:43 PM on March 6, 2012


delmoi: Imagine you have a roll of paper 100m long, and it has a sine-wave with a 1.0001 cm wavelength. You take another roll, equally long, and draw dots where the point is every 5mm.

If you take that second roll of paper, and count the number of dots, you can see that there were originally 9999 peaks on the original roll, and that the wavelength of the original signal was 1.0001cm.

However, if you only have an ordinary sheet of paper, you would never be able to tell that the original wavelength had been 1.0001cm rather then 1cm.


Sorry, but this is just wrong. You would be able to reconstruct the original waveform (at least in the middle of the page, the edges would be high frequency discontinuities). You don't need to detect the peaks, or for that matter any peaks to reconstruct the waveform. The phase and amplitude of the samples are sufficient to reconstruct any waveform. Plug your samples into a fourier analysis tool and out would pop 1.0001 cm wavelength.

Think about this like three dots defining a circle. It doesn't matter which three dots on the circle you pick. The same applies to your example. As long as you sample at better than the Nyquist rate, you can reconstruct the waveform. This may seem non-intuitive, but it is true.
posted by JackFlash at 6:44 PM on March 6, 2012 [9 favorites]


bitmage: "I'd be happy if they would just distribute CD-grade audio as FLAC files. I can compress it to mp3 if I need to, and enjoy the higher fidelity if I don't"

You mean like bleep.com does?
posted by symbioid at 7:01 PM on March 6, 2012


From Neil Young: Jobs’s Death Slowed Apple’s High-Def Music Efforts:

He described a system that would download files as the user slept.

Isn't that called Bittorrent?

In any case, I agree with many here. I just want lossless from online music suppliers. Otherwise I'll continue to still buy CDs and rip them myself. (then toss 'em in a pile)
posted by pashdown at 7:07 PM on March 6, 2012


You mean like bleep.com does?

I've bought my lion's share of bleep.com, boomkat.com, and Ok Go! FLAC & WAV. However, Apple iTunes is the 800 lb gorilla in the room. There are simply many, many, many other artists and back catalogs out there that aren't commercially available online in anything other than mp3 or aac.
posted by pashdown at 7:11 PM on March 6, 2012


Digital audio systems are actually using the position, not just counting dots.

...

But empath, COMMON SENSE SAYS RECTANGLES CAN'T REPRODUCE CURVES!! And science and math are never counter-intuitive. Not ever.
Uh, what? First of all, no, Pulse Code Modulation does not measure 'position', it measures a certain number of levels per second.

Second of all, common math shows that a finite number of rectangles can't reproduce curves that are smaller then the rectangles you're using. The quality of the approximation depends on the ratio of rectangles to curves.
I would like to congratulate a few people in this thread for their upcoming Nobel Prizes for Mathemetics.
There's no Nobel prize for mathematics, that's why they have the Fields medal. (And you don't win them for pointing out that random blog posts are incorrect.)
Sorry, but this is just wrong. You would be able to reconstruct the original waveform (at least in the middle of the page, the edges would be high frequency discontinuities). You don't need to detect the peaks, or for that matter any peaks to reconstruct the waveform. The phase and amplitude of the samples are sufficient to reconstruct any waveform. Plug your samples into a fourier analysis tool and out would pop 1.0001 cm wavelength.
Hmm. Why does the wikipedia article say this:
The theorem assumes an idealization of any real-world situation, as it only applies to signals that are sampled for infinite time;
So... are you saying that they're wrong?

And again, you're missing the point that we're talking about what you hear, not about what "information" is present.

Three dots may define a circle, but if you show someone 3 dots, they are not going to see a circle.

It may be that the software/hardware is removing any harmonic interference at high frequencies, but it may also not be.

Again imagine if you had a PCM coded recording with a sample rate of just a few hundred Hz. Even if you could do a fourier transform and determine what the original frequencies were, it seems hard to imagine that a middle C and A chord would come out sounding at all normal, because we are not doing fourier transforms when we listen to sound.

Now, it may be that at really high frequencies our ears (or the ears of young people, anyway) can't determine the differences anyway. I don't know. But that's a different question then the one about accurately reproducing the sounds.
The more picky a musician gets about audio quality, the worse that musician's actual musical output becomes. (See, e.g., Neil Young, Tom Scholz, Becker & Fagen, etc.) You don't need to argue about math to figure out that music gets worse when you argue about math.
I don't think it really matters for listening, but I think for archiving, you want as high a quality as possible. You also want if you are going to try remixing a song, since more modifications are going to destroy some of the information.
posted by delmoi at 7:13 PM on March 6, 2012


Why are you bringing up a a situation where the sample rate is insufficient? No one is claiming that you can represent something above the sample rate.
posted by flaterik at 7:18 PM on March 6, 2012 [2 favorites]


it seems hard to imagine that a middle C and A chord would come out sounding at all normal, because we are not doing fourier transforms when we listen to sound.

Oh, now you're just hand-waving, Delmoi. The tonality of notes of real instruments is defined by the upper harmonics, way above the fundamental frequency of the note itself. You know you can just try your little thought-experiment very easily: sample a 440 Hz A sine wave at 1000 Hz and tell us how you get on.
posted by unSane at 7:18 PM on March 6, 2012


i guess the kids have finished their homework!
posted by onesidys at 7:21 PM on March 6, 2012 [1 favorite]


Delmoi, here's a Wolfram demonstration which you can use to play with sample rate vs signal frequency.
posted by unSane at 7:26 PM on March 6, 2012


And, delmoi, PCM measures magnitude at each sample. That is the position. I am unclear why you feel so strongly that we are all fools.
posted by flaterik at 7:31 PM on March 6, 2012


Three dots may define a circle, but if you show someone 3 dots, they are not going to see a circle.

You don't listen to (see) the dots. You listen to (see) the wave (circle) reconstructed from the dots.

You have two conversion operations. The first operation you pick three dots on the circle. You store that information which has 100% of all the information about that circle. Then you do a second conversion in which you use the stored data to make a circle. That is the circle you see and it is identical to the original because the three dots contained 100% of the information about the circle.

The same applies to sound. You take enough analog to digital samples to completely describe 100% of the data. Then when you play back the data through a digital to analog converter that completely reconstructs the original wave. You don't listen to the samples. You listen to a wave that was constructed from those samples that is identical to the original. The two conversions are complementary.
posted by JackFlash at 7:35 PM on March 6, 2012 [3 favorites]


1. Ear hygiene and health
2. Sound level of listening environment, including isolation


Interesting thread, I remember a jazz fusion concert I went to with rock friends (been to very few large concerts) and I asked folks what the loud buzz was. No one else could hear it. There are people with incredible talent and 'ear', but I do think almost everyone had to some degree ruined their hearing just from being in our environment, let alone over cranked earbuds and sounds that are just too loud.

Compare the best encoding/reproduction to a fine live performance without amps. Well I've never been able to put my finger on it, that difference.

I'd love to be in a blind test, a great chamber quartet, the same piece recorded immediately previously and the master compared, a great compression, the music played live but output via speakers. Not to find out which is 'best' but to understand just what the difference is.
posted by sammyo at 7:48 PM on March 6, 2012


delmoi: "I think if we want to improve the overall quality of recorded sound what we really need to do is files with separate tracks so users can change the levels and EQ of individual channels to suit their own preferences. "

Not so much with playback tech, but I keep hoping for a multichannel audio standard interface. I mean, I'm not all up on tech these days so maybe it's there, but...

I hate that I have to take something like my Korg EM1, and I can take the midi data in each channel and send it to the computer, but each part is mixed into the stereo out if i want to work with the audio. If I DO want to work with the separate audio from a single device like that, I have to go into each track and record it separately. I can't just hook up the line outs and record each track individually.

So... Is there any sort of tech that allows you to do that? Like from a single electronic device out. It would almost be like if you could take the sound of each string of a guitar as it's played and pass it through - of course, the issue there is much more complex, because the sound is greater than the whole in an analog instrument. But when it comes to pure digital, there's no reason you shouldn't be able to output each audio track through something like USB into a software program and have instant access to each track as a waveform.

Or is this just stupid?
posted by symbioid at 8:06 PM on March 6, 2012


The biggest difference would be that the sound sources in the case of a recording are point source speakers, with crossovers, and phase differences beween the tweeter and woofer, and cabinet resonances. Morever the stereo image is highly dependent on the listener's position.

The closest you can get, probably, is to record a single source and play it back via a mono speaker. I do this all the time with vocals and guitars in my studio and the effect can be extremely realistic. But you have to realize the sounds is always going to be colored by the microphone used, the position and so on. Recording is a physical process. There are no ideal frequency responses anywhere.

However, once you get into recording multiple sources, mixing, EQing and so on the goal is not to faithfully reproduce the recorded sound but to create something which sounds 'good' on its own terms. You'd be amazed/horrified at the amount of manipulation that goes into classical recordings.
posted by unSane at 8:10 PM on March 6, 2012


I don't understand what you're asking. Any DAW (Ableton/Cubase/Logic, etc) will be able to split audio output into as many separate channels as you want, mono or stereo, and you can record them or mix them however you like. If you have an audio interface with multiple outs, you can send everything out separate channels.
posted by empath at 8:10 PM on March 6, 2012


So... Is there any sort of tech that allows you to do that?

Software samplers do exactly this. For example, Native Instruments' Kontakt (which I use). You can set up a whole stack of virtual instruments (eg piano, drums, organ, strings, horn section) and route them to separate channels. So you might have stereo pairs for the piano and organ, and separate outputs for cello, violin, sax, trombone, trumpet, kick, snare, individual toms, hihat, ride, crash etc etc etc, all triggered from midi tracks.

Kontakt has its own mixer built in with busses and effects, or you can just route everything out to the DAW. Or you can set up separate instances of Kontakt for each instrument and route things that way.
posted by unSane at 8:19 PM on March 6, 2012 [1 favorite]


TWF pretty much answered my question. I know in software you can do it, just the limited interfacing I've done w/hardware (old mid-90s tech to maybe early-mid 2000s) I've only seen standard stereo out interfaces (1/4 inch or RCA jacks). I'm glad (some) hardware is apparently doing this now.

As for recording at 440 hz, then comparing an A and a G#...

I set it up, generated a 440 tone, and heard nothing and the waveform is like near 0 amplitude (even though I told it 1) I have a feeling that's due to the sample rate. I upped it (remember reading somewhere once that it's best to have twice the frequency for recording for whatever reason) to 880 and I still don't really get much of a good example.

I'm curious if someone could upload a sample of what delmoi's talking (a 440hz 1 second sample of generated tones for A and G# -- 440hz and 415.3hz recorded into WAV) about so we can see for ourselves. I am clearly not skilled enough in such things.
posted by symbioid at 8:28 PM on March 6, 2012


Delta-sigma modulation is really a beautiful thing, and it's unfortunate that — despite a number of attempts, several of which made it into production devices you could buy — it's never really taken off. I blame Sony.

Back in the mid 80s, there was a device that I'm quite fond of called the dbx 700. It was a big black box that took an analog stereo input and recorded it at 644kHz, 1bit delta modulation.

It sounded miles better than PCM recorders of the era. But dbx never really matched Sony in the marketing department, and it never quite caught on the same way the PCM-F1 did. Sad, in a way, because it was a really promising technology and there's no reason why it couldn't have been applied to audio CDs rather than PCM, long before Sony's too-little-too-late attempt with SACD.
posted by Kadin2048 at 8:35 PM on March 6, 2012


Why are you bringing up a a situation where the sample rate is insufficient? No one is claiming that you can represent something above the sample rate.
The argument I'm making is that you can't perfectly represent a frequency that's close to ½ the sample rate, without having an infinite set of samples. The nyquist sampling rate theorem states that if you have an infinite set of samples then you can.

So the two statements aren't mathematically incompatible, and it's not at all clear to me why anyone thinks they are.
And, delmoi, PCM measures magnitude at each sample. That is the position. I am unclear why you feel so strongly that we are all fools.
When you said "position" I thought you meant you thought it was recording the 'position' of the peaks. Which would tell you the exact frequency. The problem with looking at the position to determine frequency is that, because you don't actually have an infinite sample of a continuous tone, you can't tell the difference between magnitude changes caused by interference, and magnitude changes caused by actual changes in the tone. It could be going up and down in volume on it's own, or it could be a changing frequency.
The same applies to sound. You take enough analog to digital samples to completely describe 100% of the data. Then when you play back the data through a digital to analog converter that completely reconstructs the original wave. You don't listen to the samples. You listen to a wave that was constructed from those samples that is identical to the original. The two conversions are complementary.
My understanding is that audio hardware just sets the voltage level to whatever the sampled magnitude was. It doesn't do a Fourier analysis on the signal and produce tones at those levels, or something like that. It might be possible to do so, but it seems like something that would depend on the quality of the DAC.

The other thing is this argument you're making that a finite number of samples can perfectly describe a frequency. It's just an assertion you're making, and seems to go against the wikipedia article. Where is the math? The actual Fourier transform is an integral from -∞ to ∞.

The question in my mind is how well can your ear distinguish the difference between high frequencies. It may be that you can't tell the difference between a a middle C and a C# if you increase the frequency 70 fold, so it doesn't matter. But you would obviously have to do something about the harmonics when recording, which is why you obviously need higher sample rates when recording.
posted by delmoi at 8:45 PM on March 6, 2012


It seems like the DAC hardware would be simpler, but does the fact that it needs to run at a much higher frequency make it more difficult in other ways?
I think the DAC would be more complicated. You'd need to store the previous level using flipflops, and then adjust it. With a regular DAC, there is no state, just a bunch switches.
posted by delmoi at 8:47 PM on March 6, 2012


I think the idea is that if you can open a 24/192 file in a real audio program in your studio with badass apogee converters and near field monitors, you are going to notice a difference in fidelity.

On a car stereo, probably not.

Hell, I notice the difference between my original multitracked project files and a PCM stereo mix down.

Personally, I'd like to have access to the project files and raw data. I'd pay good money for that!

But then again, I'm not your typical consumer.
posted by roboton666 at 8:50 PM on March 6, 2012


> Or are these standards only applicable to a few hobbyists with hundreds of dollars to spend on specialty equipment?

BWAHAHAHAHAHAHAHAHAHAHAHAHAHA

Clearly you are unfamiliar with the audiophile community.
posted by contraption at 8:53 PM on March 6, 2012


Hopefully this will clear some stuff up.

Here's a 4 kHz sine wave signal sampled at 8 kHz

As you can see, the PCM series is [0,0,0...]

But this does not represent a silent signal. As you can see, the fourier analysis below shows that the series uniquely defines a 4 kHz signal (possibly phase-inverted).

The mistake is to think that the DAC simply outputs the samples PCM values instead of using Fourier analysis to reconstruct the original signal.
posted by unSane at 8:54 PM on March 6, 2012 [1 favorite]


The nyquist sampling rate theorem states that if you have an infinite set of samples then you can.

The reason you need an infinite series of samples is that a wave of finite duration does not have a definite frequency. This is not a limitation of sampling. This is a mathematical property of waves. It has nothing to do with sampling. If you can't determine the frequency from some finite set of samples, then your ear can't determine the frequency from listening for a finite duration.
posted by empath at 9:24 PM on March 6, 2012


Unsane, if a PCM series of 0s represents a 4KHz sine wave in that scenario, then what would represent silence? My understanding is that would also be a series of 0s. How would the DAC know the difference?

(not trying to be argumentative, just trying to get my head around it).
posted by 1024x768 at 9:38 PM on March 6, 2012 [1 favorite]


It seems like the DAC hardware would be simpler, but does the fact that it needs to run at a much higher frequency make it more difficult in other ways?

It's not more complicated in theory, but in practice I think the implementations tended to be equally or perhaps a bit more complex than competing PCM-based devices. Delta-sigma modulation is theoretically very simple, but the dbx 700 (the only implementation that I'm particularly familiar with) used a sophisticated variety of it that I'm sure wasn't cheap to implement using mid-80s parts.

The Model 700 was a predictive delta modulator. The simplest kind of 1-bit modulation is to compare the incoming analog signal level to the level that the signal was at the time the last sample was taken, and if the level is lower, mark 0, if it's higher, mark 1. (A constant signal leads to a very small sawtooth wave — this is a form of quantization error.) You can build an encoder like this pretty easily ... I remember doing one in an undergraduate digital signals class with 7400 TTL. The performance of the resulting ADC/DAC depends on the sample rate, but also on the step size (delta) — how much do you let the input signal increase or decrease by, before you mark down a 0 or 1 in the output?

In most trivial implementations, this delta is fixed. But dbx made the delta parameter adaptive, based on the amount of quantization error in the past few samples (or something like that; it's been a while since I read the documentation and they were a bit sketchy on exactly how it worked, probably because it was their 'secret sauce' at the time). This introduces a lot of additional complexity into the encoder and decoder, but makes it much more flexible. It's this expense that I suspect was their undoing.

In fairness though, the old PCM recorders like the PCM-F1 needed a lot of additional circuitry, above and beyond what you'd need just to record y bits of amplitude at x samples/sec, because of how nastily a trivial PCM encoder fails when you feed it an input signal that's above the encoder's Nyquist frequency. For 44.1kHz this is 22.05kHz, well within the range of what you can pick up from a good microphone, necessitating a very complex anti-aliasing filter to cut off anything higher than 22.05kHz while not affecting anything lower. These filters were not great in the PCM-F1, and some people claim that they sound bad. (The dbx 700 doesn't have any anti-aliasing filters, because its behavior when pushed isn't as acoustically ugly. It "saturates" in a way that's similar to analog tape.)

Although I don't have any evidence of this directly, I think that dbx thought that the problems of building the anti-aliasing filters that would allow PCM recording at a reasonable bitrate (small enough that you could cram it into a video signal and record it on video tape, which was the preferred transport at the time) were insurmountable or nearly so, which is why they went with predictive delta modulation. Unfortunately for them, Sony managed to get a "good enough" PCM solution with the PCM-F1, and later PCM recorders improved the anti-aliasing filters.
posted by Kadin2048 at 9:40 PM on March 6, 2012 [1 favorite]


Unsane, if a PCM series of 0s represents a 4KHz sine wave in that scenario, then what would represent silence? My understanding is that would also be a series of 0s. How would the DAC know the difference?

It's an extreme edge case (chosen to make a point), because not only is the signal frequency exactly half the sampling frequence, but the phase is perfectly aligned to make all the values 0. Any sane DAC would indeed interpret this as silence. However, if you shift the phase minutely so that the PCM values are non-zero, it can now be interpreted as a DC signal or as a 4 kHz tone and I guess the DAC would have to describe.

In the real world you always have a noise floor, or dither, and you are generally representing signals below the Nyquist frequency as opposed to sitting right on it.

I was just trying to show in the most dramatic way that the amplitude values of the PCM coding are not what defines the peaks of the derived signal which is output by the DAC. Sorry if it got even more confusing!
posted by unSane at 9:48 PM on March 6, 2012 [2 favorites]


s/describe/decide
posted by unSane at 9:48 PM on March 6, 2012


Delta-sigma modulation is really a beautiful thing, and it's unfortunate that — despite a number of attempts, several of which made it into production devices you could buy — it's never really taken off.

Almost all audio A/D and D/A converters used in audio today are sigma-delta converters. They typically run 16 or 24 times the sample rate. These are converted into standard 16 or 24 bit wide samples used in audio.
posted by JackFlash at 10:35 PM on March 6, 2012 [1 favorite]


I notice that when I rip my piano CD's using iTunes, oftentimes the loud and high registers will have crackle. This is via my PC or my iPod. What gives? I hate it.
posted by polymodus at 10:37 PM on March 6, 2012


But this does not represent a silent signal. As you can see, the fourier analysis below shows that the series uniquely defines a 4 kHz signal (possibly phase-inverted).
Uh... and what was the Amplitude, again?

Also, what does it show if you sample a 3.999kHz signal sampled at 8khz, with, say 10 samples? Or 3.951 kHz (musical note B7)? I didn't say that you couldn't perfectly represent exactly half the sample rate, what I said is that you can't exactly represent differences in frequencies that are close to half the sample rate. Wavelengths are exact multiples should be fine (if the phase is lined up). The problem is when they're close to a multiple.
The reason you need an infinite series of samples is that a wave of finite duration does not have a definite frequency. This is not a limitation of sampling. This is a mathematical property of waves. It has nothing to do with sampling. If you can't determine the frequency from some finite set of samples, then your ear can't determine the frequency from listening for a finite duration.
I didn't say they could, but the question is whether or not what your ear hears is the same. You can obviously tell the difference between 7459hz (b-flat) and 7902 (b). So the question is do you lose the ability to distinguish those notes if you double or triple the frequencies? It seems like that's possible. But if it's not the case then you would lose musical tonality in those frequencies, even if you could use a computer to figure out the exact frequency.
The mistake is to think that the DAC simply outputs the samples PCM values instead of using Fourier analysis to reconstruct the original signal.
Why is that a mistake? It seems like you can make a DAC however you want. Some might use Fourier analysis, others might not.
It's an extreme edge case (chosen to make a point), because not only is the signal frequency exactly half the sampling frequence, but the phase is perfectly aligned to make all the values 0. Any sane DAC would indeed interpret this as silence. However, if you shift the phase minutely so that the PCM values are non-zero, it can now be interpreted as a DC signal or as a 4 kHz tone and I guess the DAC would have to describe.
It's actually not an edge case, but falls outside the theorem. If you read the article on Nyquist Shannon sampling theorem :
More recent statements of the theorem are sometimes careful to exclude the equality condition; that is, the condition is if x(t) contains no frequencies higher than or equal to B; this condition is equivalent to Shannon's except when the function includes a steady sinusoidal component at exactly frequency B.
B being 4kHz in your example (half the sampling rate). In any event, pure silence is obviously not the same thing as a 4kHz tone. Even if the 'information' were there it wouldn't sound the same when played through any type of DAC. The only reason we know it's there is because you've told us it's there. We also have no way of finding out what the amplitude was (which could have been zero, for all we know)
posted by delmoi at 10:39 PM on March 6, 2012


I notice that when I rip my piano CD's using iTunes, oftentimes the loud and high registers will have crackle. This is via my PC or my iPod. What gives? I hate it.


That usually means the CD was recorded either with overs or peaks that hit 0 dB. The mp3/AAC encoding algorithm is quite capable of producing distortion in these circumstances due to intersample interpolation. Many mastering suites master to -0.2 dBFS for just this reason. Unfortunately I don't know of a simple solution to it.
posted by unSane at 10:43 PM on March 6, 2012


I didn't say that you couldn't perfectly represent exactly half the sample rate, what I said is that you can't exactly represent differences in frequencies that are close to half the sample rate.


Why can't you?
posted by the duck by the oboe at 10:46 PM on March 6, 2012


I would assume get the CD data as a wav, open it in a simple audio editor (fusion is great for simple tasks on a mac), and re-normalize. But that may not count as "simple"...
posted by flaterik at 10:48 PM on March 6, 2012


Why is that a mistake? It seems like you can make a DAC however you want. Some might use Fourier analysis, others might not.

Well, if you make them in stupid way which does not respect the math on which the whole thing is based, they sound like shit. There's certainly that.

The only reason we know it's there is because you've told us it's there.


Sure. As I explicitly pointed out, it's an edge case which (as you correctly point out) is excluded in recent statement of the sampling theorem. The point you are ignoring is that PCM magnitudes do not represent peaks but are used to reconstruct the original sub-Nyquist waveform. I don't know why you have such a huge problem with this -- it's mathematically established and completely uncontroversial.

It seems like that's possible. But if it's not the case then you would lose musical tonality in those frequencies, even if you could use a computer to figure out the exact frequency.

You seem to really not understand the fundamental concept that you can't hear harmonic series above (say) 20 kHz. These frequencies DO NOT MATTER when it comes to human perception of timbre. If a note has an upper harmonic above 20 kHz (say), I can't hear it. It's therefore completely irrelevant if the sampling algorithm doesn't reproduce it. That's the whole point of choosing 44.1/48 kHz as the sampling frequencies.

It's endearing that you think you've figured out a huge flaw in all of this but believe it or not the people who designed standard audio formats did actually know what they were doing.
posted by unSane at 10:50 PM on March 6, 2012 [3 favorites]


"you can't tell the difference between magnitude changes caused by interference, and magnitude changes caused by actual changes in the tone."

That's because there isn't a difference. If you changed the amplitude at a given point, you just changed the frequency space.
posted by flaterik at 10:52 PM on March 6, 2012


I would assume get the CD data as a wav, open it in a simple audio editor (fusion is great for simple tasks on a mac), and re-normalize. But that may not count as "simple"...

No, that doesn't work unfortunately. If you normalize to 0 dbFS, the MP3/AAC encoding process can still produce overs (eg you have two full scale PCM samples next to each other -- the algorithm interpolates the intersample value and comes up with a peak which is even higher).

What you actually have to do is renormalize to -0.2 or -0.3 dBFS, which takes a full on DAW, but guards against intersample overs when the track is encoded to a lossy format.
posted by unSane at 10:55 PM on March 6, 2012


Oh, yeah, I meant to renormalize below 0, but was being insufficiently precise in my language!
posted by flaterik at 11:03 PM on March 6, 2012


I didn't say that you couldn't perfectly represent exactly half the sample rate, what I said is that you can't exactly represent differences in frequencies that are close to half the sample rate.
Because it wouldn't 'line up'.

I guess I can link to this calculator thing I made a while ago to show you what I'm talking about: here's two sine waves One has a wavelength of one 'screen' (2*pi) the other one has a wavelength of 0.4*pi. It shows the phase changing over time.

Click the graph settings tab and select 10 samples (the default is 300). With a frequency of that obviously wouldn't "sound" like 10x the sample, and there you have a "sampling rate" (10x/screen) of twice the frequency (5 waves per screen).

Now, let's look at two waves that are close to eachother, but not quite the same a frequency of 5 waves per screen and 4.9 waves per screen. At 300 samples it looks like two waves going slowly in and out of phase. Change it to the 10 samples, and you get something different: two waves that have a wavelength of 0.4pi, but the amplitude of each is going up and down at different frequencies.
posted by delmoi at 11:13 PM on March 6, 2012


sox can do normalization without a full-on DAW. How easy or difficult it is to incorporate into your workflow depends on your workflow, but if you're ripping with something scriptable (like EAC+REACT2) it wouldn't be too difficult to automatically check your peaks and renormalize the source if necessary before converting it to mp3.
posted by Lazlo at 11:15 PM on March 6, 2012


Oops, in my last comment i meant to quote the duck by the oboe
posted by delmoi at 11:21 PM on March 6, 2012


Change it to the 10 samples, and you get something different: two waves that have a wavelength of 0.4pi, but the amplitude of each is going up and down at different frequencies.

Are you saying that two different frequencies have the same wavelength?

Think about the composite waveform.
posted by the duck by the oboe at 11:48 PM on March 6, 2012


By the way, polymodus, while clipping is more likely, high-end weirdness on rips can also be caused by pre-emphasis, which iTunes doesn't always detect or compensate for. I've heard some classical labels still use it. It would manifest as unusually harsh and "brittle"-sounding highs.
posted by Lazlo at 12:11 AM on March 7, 2012


delmoi: I guess I can link to this calculator thing I made a while ago to show you what I'm talking about

From what I can tell, you are drawing lines between sample points. That isn't how waveforms are reconstructed from samples, at all. In a DAC, the waveform is recreated by impulses at each sample point. It is the mathematical sum of these impulses that recreates the waveform, not by drawing lines from point to point. (In practice, the impulses are actually generated by step-and-hold, but that is an unnecessary detail).

But there is another critical step. The signal is then passed through another anti-aliasing filter, called an anti-imaging or reconstruction filter. This low pass filter converts the string of impulses into a smooth waveform that matches exactly the waveform the samples were taken from. These are inverse mathematical functions -- waveform to sample and impulse to waveform. It is not connect-the-dots.

And no, it doesn't matter if you have two waves of slightly different frequency near the Nyquist limit. They will be reproduced just fine. I know that you can't be convinced of this, but it is true.
posted by JackFlash at 12:35 AM on March 7, 2012 [4 favorites]


Are you saying that two different frequencies have the same wavelength?

Think about the composite waveform.


I'm not saying they will, I'm showing a graph so you can see it for yourself. If the signal is quantized over time, then the wavelength has to be a multiple of the sample rate, when you're looking at a small set of samples.

Over a long series of samples, you can 'figure out' that if the original sound level was constant, then the frequency must have been some other number (i.e. the 'information' is still there).

Think about the composite waveform.

Okay... I'm thinking about it. Actually, why not graph it:

here are the two waves I talked about earlier, with the composite in red.

over a short period of time. Of course it gets louder and softer over a longer period of time here's a graph with the animation speed increased so you can see that happening. As the two waves go in and out of phase, the composite goes from magnitude 2 to magnitude zero.

But what's the point? The frequency of the waves is 5 per screen. If you change the sampling rate from 300 to 10 samples per screen, the waves don't look anything like what they are supposed too (The sample rate isn't included in the link, you have to click 'graph settings' and then change the sample pull-down)

It just looks like a bunch of waves with a wavelength of 1/5th of the screen, increasing and decreasing in magnitude over time. You have no way of knowing if that is because it was a recording of a constant tone over time, or if it was caused by a real signal that was 'supposed' to be modulated that way.
posted by delmoi at 12:36 AM on March 7, 2012


From what I can tell, you are drawing lines between sample points. That isn't how waveforms are reconstructed from samples, at all. In a DAC, the waveform is recreated by impulses at each sample point. It is the mathematical sum of these impulses that recreates the waveform, not by drawing lines from point to point. (In practice, the impulses are actually generated by step-and-hold, but that is an unnecessary detail).
Yeah, it's just connects the dots. I did read some of the articles (i.e. on the sinc function). I don't really get why that would make a huge difference for sounds that were close to the nyquist limit. I'm certainly not saying that it would be noticeable at 44.1kHz. But like I said, I think if the sample rate were ~440Hz you would have trouble telling the difference between an A and a G#.
posted by delmoi at 12:51 AM on March 7, 2012


Delmoi,

Let's be clear here: This guy is a contrarian audiophile (he said he could tell by ear which MP3 encoder was used). With a blog.

Oh, mid 90's encoders? You could totally tell them apart. One (from the guys that began with X, not Xiph) had a hard cutoff at 16khz. Others had various recognizable failure modes around cymbals and other high frequency noise.

He throws the word "perfect" around incorrectly, but it's a pretty good piece. And man, he wrote Vorbis.

There is no reason to suspect he's not a crank, and indeed some of the stuff he's saying seems to be wrong (as I said, his thing about vision, for example, was way off)

From the Wikipedia article on CIE XYZ:

But, in the CIE XYZ color space, the tristimulus values are not the S, M, and L responses of the human eye, even if X and Z are roughly red and blue. Rather, they may be thought of as 'derived' parameters from the red, green, blue colors.

I'm about as deep as an amateur can get into spectral response curves of human opsins, and I'm not seeing anything wrong with Monty's arguments. The brain does all sorts of weird things once it gets signals from the eye.
posted by effugas at 12:59 AM on March 7, 2012


Yeah, it's just connects the dots.

You can't just connect the dots- you need to use anti-aliasing filter.
posted by the duck by the oboe at 1:00 AM on March 7, 2012 [1 favorite]


neckro23: To the contrarian-but-short-on-actual-data audiophiles in this thread: Is there an actual reason why the Nyquist Sampling Theorem isn't "valid" as explained in the article? A non-handwavey explanation with a minimum of made-up words would be preferred.

The most basic reason is that anti-aliasing filters are really really hard to make. I mean.. Not so hard if you have 10 or 20 octaves to reach sufficient attenuation, but in the CD audio standard you have to reach sufficient attenuation in a tiny fraction of an octave.

StickyCarpet: This is a very good point. Your brain can reconstruct a signal if it has to, and you don't even know that you're doing it,

Hence the important effect of dither. Listen to a recording with a lot of dither, and you feel like you can listen to details down into the noise. It is pretty amazing. Also really noisey, (as in the hiss of white noise noisey).

lupus_yonderboy: For example, there's a technique called acoustic heterodyning, where you have two carrier soundwaves, each of which has all its energy in the megahertz range (which makes them extremely directional). You can't hear either carrier, but if the two intersect at or near your ear, you hear the difference between these two waves, which is firmly in the audio range.

That's a really interesting thought. JackFlash's refutation is interesting too, but not complete. There is indeed room there for issues with Nyquist theory. I mean, maybe only issues in highly contrived cases, but still..

delmoi: Well, not exactly. The visual defects in 1080p are still really obvious (as are compression artifacts, if you know what to look for) while the sound stuff pretty much sounds perfect to most people.

Nah, the sound stuff is crap if you know what to listen for too. Like, when somebody smacks symbols on a real drum kit, the sensation in your ear is almost of being pinged. Like if somebody flicked a finger at your ear drum. Only really remarkable acoustic reproduction systems can reproduce that 'sensation'. Similarly the sense of your chest heaving when a base drum is struck. And, it isn't just an issue of volume! A powerful subwoofer can shake your chest, but that fast attack sensation of the wind being popped out of you by the pressure.. Subwoofers alone don't do that.

delmoi: Why is that a mistake? It seems like you can make a DAC however you want. Some might use Fourier analysis, others might not.

All DACs use a low pass filter as the last element, so they all use Fourier analysis.

To your general point about time limited implies not band limited, and what complications that causes, I'll think more on it, but for now I think you are blowing it way out of proportion.
It might have something to do with the relative size of a recording to wavelength of sound and image to wavelength of graphic object. I haven't thought it through much though..


Why are delta-sigma modulators better.. Well, if you want to make a good 16bit dac, you have to make a circuit that can very reliably output every voltage level in the entire 96dB range of output possible. That is really really hard. Meanwhile it is really easy to make a 1 bit DAC, cause that is just on or off. Making a 4 bit DAC is easier than 16, but harder than 1, and so on.

Of course there is also the problem of jitter in the clock, but I think on balance it is easier to make a good fast clock than 16bits worth of faithful analog levels. Also, there is the relative impact of jitter vs. bias in your output levels, I can't remember anything that would shed light on that trade off though.
posted by Chuckles at 1:02 AM on March 7, 2012


delmoi: "Yeah, it's just connects the dots."

FFS, No it isn't. Having "read some articles" is no substitute for actual facility with the mathematics. If you actually perform the necessary calculations your "problem" won't show up.
posted by Proofs and Refutations at 1:09 AM on March 7, 2012


FFS, No it isn't. Having "read some articles" is no substitute for actual facility with the mathematics. If you actually perform the necessary calculations your "problem" won't show up.

Hmm, Read more carefully. I was responding to someone who wrote "From what I can tell, you are drawing lines between sample points." and then mentioned "connect the dots", I was agreeing that, yes, my graph was only connecting dots, unlike what happens in a real system.
posted by delmoi at 1:28 AM on March 7, 2012


Hmm... I tried graphing it using the Shannon interpolation formula using ten samples. Unfortunately there is a bug in my JSON encoder... if you include a sub-function in your function the } cause a problem. I need to fix that, but right now I don' t really have time.

In any event, to show one impulse generated by

return Math.cos((x*5)-t*5);

Sampled 10 times, and interpolated using the Shannon interpolation formula, I came up with this code:

var T = 1/Math.PI*0.2; //pi*2/10
var sinc = function(x) {return Math.sin(x)/x}
var source = function(x){return Math.cos((x*5)-t*5);}
var sampleAt = function(n){return source (Math.PI*0.2*n);}
var interpolate = function(n,t){return sampleAt (n)*sinc((t - n*T)/T);}
return interpolate (0,x);
where the interpolate(n,t) function generates the pulse for sample n at 'time' t (in this case 'time' is x, since we are graphing x as time. 't' outside of the interpolate function is like the frame number of the animation.

Anyway, to graph the sum of 10 samples, you use this code:

var T = 1/Math.PI*2;
var sinc = function(x) {return Math.sin(Math.PI*x)/Math.PI*x}
var source = function(x){return Math.cos((x*5)-t*5);}
var sampleAt = function(n){return source (Math.PI*0.2*n);}
var interpolate = function(n,v){return sampleAt (n)*sinc((v - n*T)/T);}
var sum = 0;
for(var i = 0; i < 10; i ++) sum += interpolate (i,x);
return sum/10;
Anyway, the result is not at all like the original function.
posted by delmoi at 3:17 AM on March 7, 2012


The nyquist limit says you can encode frequencies up to but not including sr/2. This is why delmoi's examples are broken. All frequencies encoded are effectively encoded modulo sr/2, so that if you encode a signal of exactly sr/2 you get 0 hz (silence or DC), if you are stupid enough to try to encode sr*.75 the output is sr*.25. It turns out that bad enveloping, failure to envelope, and digital clipping artificially introduce frequencies above nyquist, and this is how they make such nasty ungodly noises (I love nasty noises they are kind of a specialty of mine). Yes I know it sounds weird to say you can generate digital artifacs that have a frequency higher than the sr, but seriously, the math works out, it happens. Also remember that frequencies only apply to perfectly sinusoidal waveforms, all other waveforms must be broken down into thier component sinusoids.
posted by idiopath at 4:09 AM on March 7, 2012


The obvious next objection is "look at this waveform just .0001 hz below nyqust - not at all like the original"!

The input waveform, if sinusoidal, will be the simplest input that generates the bizarre output you see, and is the proper conversion of that digital form.

The next objection after that is likely "look at this combination of two sine waves, both below nyquist, thier samples don't look the same as the original waveform at all".

When you combine two sinusoids, you get four frequencies: the two originals, thier sum, and thier difference (the sum and difference signals will be stronger with a stable coherent phase between the inputs IIRC). If you are doing synthesis you will sometimes need to upsample, then mix and then filter the result, then downsample again to prevent this sort of noise. Or just lowpass instead and lose the highs. Luckily for natural signals energy levels are so low most of the time, and phases never stay coherent so that you don't need to upsample and filter before doing a simple mixing operation.
posted by idiopath at 4:30 AM on March 7, 2012


The nyquist limit says you can encode frequencies up to but not including sr/2. This is why delmoi's examples are broken. All frequencies encoded are effectively encoded modulo sr/2, so that if you encode a signal of exactly sr/2 you get 0 hz (silence or DC),
Uh, no. They are right below the nyquist limit. (Or actually they showed one at half the nyquist limit, and one with a 2% lower frequency) this example shows a sinewave with a wavelength of two samples as black, and one with a wavelength of 2.04 samples as cyan

The claim was that there was accurate representation up too half the limit. In fact unSane posted here an example using exactly half the limit.
The obvious next objection is "look at this waveform just .0001 hz below nyqust - not at all like the original"!
Right, exactly. It looks a lot like a graph of a of a sine wave at the nyquist limit. The problem is the claim that all frequencies up to the nyquist limit are represented. I don't think that's the case, even if you use the Shannon interpolation formula (which I can't link too due to the link-encoding bug. *sigh*)
The next objection after that is likely "look at this combination of two sine waves, both below nyquist, thier samples don't look the same as the original waveform at all".
already did that, except in that case it was one right at the limit, and one right below it. Shifting the value down a tiny bit. bit doesn't change anything

Mathematically, you could still figure out what the original frequency had been if you had enough samples There is probably some formula you could come up with to figure out what the 'margin of error' between the best guess ef for a frequency (f) and the number of samples (s), ratio between f and the nyquist limit (r), and the number of quantization levels for a set of samples. But I'm pretty sure it's not the case that ef(s,r,q) = 0 for any f below the nyquist limit.

However, simply using the shannon interpolation formula doesn't even seem to get you close, from what I can tell.
posted by delmoi at 5:22 AM on March 7, 2012


What are you trying to do, Delmoi? Trying to disprove Nyquist?

It's a fundamentall theorem of information theory.

Do you imagine that it's wrong?
posted by unSane at 5:33 AM on March 7, 2012


Mathematically, you could still figure out what the original frequency had been if you had enough samples There is probably some formula you could come up with to figure out what the 'margin of error' between the best guess ef for a frequency (f) and the number of samples (s), ratio between f and the nyquist limit (r), and the number of quantization levels for a set of samples. But I'm pretty sure it's not the case that ef(s,r,q) = 0 for any f below the nyquist limit.

There is a formula: delta n * delta t = 1/(4pi)

where delta n is the frequency spread. Again, this has nothing to do with sampling, or the nyquist limit.
posted by empath at 5:34 AM on March 7, 2012


Do you imagine that it's wrong?

Yeah, I really don't get where Delmoi is going with this. Yes, without a wavelength of infinite duration, you can't pinpoint an exact frequency, but the frequency spread approaches 0 relatively quickly, and your ear has the exact same limitation, because there is no exact frequency, there is only a frequency range.

See bandlimiting.
posted by empath at 5:41 AM on March 7, 2012 [1 favorite]


See also: The Gabor Limit.
posted by empath at 5:42 AM on March 7, 2012


Delmoi, your attempt to use Shannon uses 10 samples. The interpolation algorithm specifies the sum from minus infinity to plus infinity. Obviously you never have infinite samples, but 10?
posted by unSane at 5:45 AM on March 7, 2012


But I'm pretty sure it's not the case that ef(s,r,q) = 0 for any f below the nyquist limit.

In fact that's exactly what the Nyquist theorem states, given that the input signal is bandlimited to be below the Nyquist threshold (to avoid aliasing). So 'pretty sure' vs. the Nyquist theorem, hm.
posted by unSane at 5:50 AM on March 7, 2012 [1 favorite]


If you are absolutely positive that Shannon and Nyquist are somehow incorrect, and you can work it up into a peer-reviewed paper that is accepted for publication, you might be looking not only at a new career, but at a Nobel prize, because undoubtedly your discovery will open vast new avenues of information theory to explore. Or, maybe, you have a loose cable.

More kindly, if you have a result that argues against the foundational theorem of information encoding that is used for basically everything from microprocessor design to radio astronomy, I suggest you check your methodology and assumptions more thoroughly.
posted by seanmpuckett at 6:01 AM on March 7, 2012 [1 favorite]


Yes, without a wavelength of infinite duration, you can't pinpoint an exact frequency

(I meant to say waveform here.)
posted by empath at 6:41 AM on March 7, 2012


What are you trying to do, Delmoi? Trying to disprove Nyquist?

It's a fundamentall theorem of information theory.

Do you imagine that it's wrong?
From the wikipedia article, again:
The theorem assumes an idealization of any real-world situation, as it only applies to signals that are sampled for infinite time; any time-limited x(t) cannot be perfectly bandlimited. Perfect reconstruction is mathematically possible for the idealized model but only an approximation for real-world signals and sampling techniques, albeit in practice often a very good one.
Sampled. for. Infinite. Time.

That's obviously true. All you would need to do would be to count the peaks, and divide by the amount of time. That would work even if your frequency rate was 49.999...% of the sampling rate (discounting quantization errors that might cause a peak to appear to be zero) But that only works if you have an infinite number of samples. Not a small number of samples.
But I'm pretty sure it's not the case that ef(s,r,q) = 0 for any f below the nyquist limit.
In fact that's exactly what the Nyquist theorem states, given that the input signal is bandlimited to be below the Nyquist threshold (to avoid aliasing). So 'pretty sure' vs. the Nyquist theorem, hm.
s is the number of samples the Nyquist applies when s = ∞

What I don't understand is why people seem to think the Nyquist theorem applies when there are not an infinite number of samples.
posted by delmoi at 7:04 AM on March 7, 2012


If you are absolutely positive that Shannon and Nyquist are somehow incorrect, and you can work it up into a peer-reviewed paper that is accepted for publication, you might be looking not only at a new career, but at a Nobel prize, because undoubtedly your discovery will open vast new avenues of information theory to explore. Or, maybe, you have a loose cable.
OMG. I am not saying the Nyquist theorem is wrong. I am saying you guys are unable to understand that it applies to samples of infinite duration, which is clearly stated in the Wikipedia article!!!
posted by delmoi at 7:06 AM on March 7, 2012


In the context of our discussion, which is about audio files used to play back music, the number of samples is effectively infinite.

For example, a 3-minute song recorded at 44.1 kHz contains almost eight million samples.
posted by unSane at 7:14 AM on March 7, 2012


delmoi: you can loop the sample and everything just works. (its how you can tlak about frequency domain and do transforms like FFT with finite sequences. interestingly FFT requires powers of 2 numbers of samples to work so you end up adding a bunch of zeros onto the end of the real samples---it doesnt change the power density spectrum...)
posted by dongolier at 7:25 AM on March 7, 2012 [1 favorite]


Further to the "infinite for the purposes of our discussion" the sinc term for a 0db signal in 16 bits drops below the noise floor after 1/2^16 samples, which is about 2/3 of a second.

Anyway, my very basic understanding of audio encoding algorithms is that they use instead of true sinc, a "windowed sinc" which is altered slightly on both encoding and decoding such that for most of the important audio spectrum there's no perceptible difference. This also explains the importance of low-pass filtering on encoding; it shrinks the necessary size of the window. MP3 encoding breaks audio up into chunks of samples of a certain length, then encodes them using terms of a windowed sinc function that corresponds well to the chunk size, then throws away the low order bits of those terms to meet the desired bit rate (which bits, and how many, is determined by variable/constant bit rate algorithms, and is one of the important choices made in keeping perceptible differences minimal).

What's beautiful is that JPG works almost exactly the same way.
posted by seanmpuckett at 7:37 AM on March 7, 2012


A suggestion. I think where delmoi is making his error is his window is two small. The poster who made the comment about 10 samples? is alluding to this. For a 400 hz signal which has a period of .0025 sec you want a window of .025 sec to get 10 complete waveform samples in a crude fourier transform. (Yes there are a multitude of ways to do high resolution transforms where you don't need to window the data out to .025 sec but if you are doing simple linear math and simple linear interpolation, no fancy sinc's, &c you want .025 sec) At 40 khz sampling a .025 sec window has 1000 samples.

You do not have to record from minus infinity to plus infinity but you definitely want to record for more than .000025 sec.

Now the biggest issue as I see it was only made plain in this entire thread by Chuckles where he wrote:

The most basic reason is that anti-aliasing filters are really really hard to make. I mean.. Not so hard if you have 10 or 20 octaves to reach sufficient attenuation, but in the CD audio standard you have to reach sufficient attenuation in a tiny fraction of an octave.

You are trying for full pass at 20 kHz and -100 dB at 22kHz and it's just damn tricky to do that. That's why us dreamers want 400 kHz sampling so we can go full pass at 20 kHz and -100 dB at 400 kHz.
posted by bukvich at 7:37 AM on March 7, 2012


But I'm pretty sure it's not the case that ef(s,r,q) = 0 for any f below the nyquist limit.


As someone else alluded to upthread, this is just the Heisenberg Uncertainty Principle and has nothing to do with sampling at all. The shorter the section of waveform you examine, the less certain you are of its frequency distribution.

In the audio case, the more precisely you try to locate a sound in time, the less you know about what kind of sound it is.

You can demonstrate this in a DAW very easily by zooming in on, say, a cymbal hit in the audio window. When you have the whole cymbal hit from transient to tail in the window, you can tell a lot about what kind of sound it is -- its frequency distribution etc. But if you zoom right into the transient so you are looking at single oscillations, you can tell very little about what kind of sound it is. The sampling frequency puts a limit on how far you can zoom into the transient, but it's not what makes it hard to know the frequency -- that's just Heisenberg in action and is a function of the window width.
posted by unSane at 7:48 AM on March 7, 2012 [2 favorites]


What I don't understand is why people seem to think the Nyquist theorem applies when there are not an infinite number of samples.

What I don't understand is why you haven't read the multiple posts from people explaining to you the reason why it requires an 'infinite number of samples' and why it's not relevant for signal processing. It has nothing to do with sampling. It has to do with time-frequency uncertainty.
posted by empath at 7:51 AM on March 7, 2012


For example, a 3-minute song recorded at 44.1 kHz contains almost eight million samples.
Right... but a song is not a single continuous tone.

We are talking about what you hear when you listen to the song. You're ears are not performing a fourier transform over the entire song, rather you have receptors which are tuned to specific frequencies. There is probably some overlap, but can obviously tell the difference between 10% frequency difference (there is a 5.6% between middle C and C#)

And the thing is, your ears do not perform a Fourier transform. If you were sampling at 554.36Hz (twice middle C#) a 261.63Hz signal would not sound like C, but rather a pulsating C#. If you counted the peaks, and you knew it was 'supposed' to be a constant signal, then you could tell that the original frequency had been 261.63Hz.

However, without that, you wouldn't be able to tell if it was 'supposed' to be a pulsing C, or a C#.

And in any event, it would sound like a pulsing C#, not a C.
delmoi: you can loop the sample and everything just works. (its how you can tlak about frequency domain and do transforms like FFT with finite sequences. interestingly FFT requires powers of 2 numbers of samples to work so you end up adding a bunch of zeros onto the end of the real samples---it doesnt change the power density spectrum...)
I'm not saying it doesn't work, in practice. I suspect people probably can't tell the difference between C and C# if you multiply the frequencies by, say, 70 (going from 227Hz to 19,390Hz) but that's a separate question as to whether or not the Nyquist theorem "proves" that there is no signal loss below the Nyquist limit, even with a small number of samples. Various posters were saying you could do it even with ten samples, or whatever.
MP3 encoding breaks audio up into chunks of samples of a certain length, then encodes them using terms of a windowed sinc function that corresponds well to the chunk size, then throws away the low order bits of those terms to meet the desired bit rate
It's called a frame. The frame size in MP3 files is 1,152 samples, apparently.
A suggestion. I think where delmoi is making his error is his window is two small. The poster who made the comment about 10 samples? is alluding to this. For a 400 hz signal which has a period of .0025 sec you want a window of .025 sec to get 10 complete waveform samples in a crude fourier transform. (Yes there are a multitude of ways to do high resolution transforms where you don't need to window the data out to .025 sec but if you are doing simple linear math and simple linear interpolation, no fancy sinc's, &c you want .025 sec) At 40 khz sampling a .025 sec window has 1000 samples.
Yeah other posters [1,2] were saying you could get the frequency out with just 10 samples. That is what I was saying was incorrect.
posted by delmoi at 7:52 AM on March 7, 2012


Not to go on and on about it, but the real beauty of all of this is where the rubber meets the road.

Of course we can't have an infinite number of samples. Everyone knows that. The point of the research, all of it, and what is most relevant to our discussion, what the Fraunhofer Institute did that was so important was to systematically study, with real audio and real ears, exactly which parts of infinity we can get rid of so that we have something practical for compressing audio.

The argument really isn't "is Shannon wrong" it's rather "is it time to revisit those assumptions about which parts of infinity can we get rid of?"

I think there's other problems that are far more important, like the hiss coming out of my MacBook's internal hard drive right now, but the one thing that better encoding would get us is a better absolute best -- assuming everything else is perfect. But you're just not going to hear a difference otherwise.

I do wonder how many people would be so happy about Apple's "new awesome audio format" if they realized in order to hear a difference they'd need to spend $50 more on their Apple gear for a decent audio output chain to be embedded in it (or buy a Headroom USB amp) and $200 on in-ear transducers, or $5K on speakers and an amp and another $50K on an appropriate room treatment.
posted by seanmpuckett at 7:56 AM on March 7, 2012


If you were sampling at 554.36Hz (twice middle C#) a 261.63Hz signal would not sound like C, but rather a pulsating C#. If you counted the peaks, and you knew it was 'supposed' to be a constant signal, then you could tell that the original frequency had been 261.63Hz.

Dear God, delmoi, why do you KEEP doing this? The PCM samples are not just played back in a join the dots fashion? This is pathological. I'm out.
posted by unSane at 7:57 AM on March 7, 2012


Dear God, delmoi, why do you KEEP doing this? The PCM samples are not just played back in a join the dots fashion? This is pathological. I'm out.
Keep doing what? I understand the whittaker Shanon interpolation formula, which uses the normalized sinc function to generate pulses. I graphed it. It doesn't change anything about what I'm saying.

Here's the code I used to graph it:
var T = 1/Math.PI*2;
var sinc = function(x) {return Math.sin(Math.PI*x)/Math.PI*x}
var source = function(x){return Math.cos((x*4.9)-t*5);}
var sampleAt = function(n){return source (Math.PI*0.2*n);}
var interpolate = function(n,v){return sampleAt (n)*sinc((v - n*T)/T);}
var sum = 0;
for(var i = 0; i < 10; i ++) sum += interpolate (i,x);
return sum/10;
Like I said, you don't get anything that looks like the original waveform (the function Math.cos((x*4.9)-t*5);)
posted by delmoi at 8:03 AM on March 7, 2012


You are trying for full pass at 20 kHz and -100 dB at 22kHz and it's just damn tricky to do that. That's why us dreamers want 400 kHz sampling so we can go full pass at 20 kHz and -100 dB at 400 kHz.

Do you honestly think your ear has a dynamic range of 100dB at 20kHz?
posted by rocket88 at 8:08 AM on March 7, 2012


1. Ear hygiene and health

It's been my experience, in incrementally improving my playback system, that the attainable upgrades have smaller and smaller perceptible impacts.

There comes a point where Q-Tips are definitely required to hear the differences. These blind testings should require that the subjects undergo routine ear maintenance.
posted by StickyCarpet at 8:11 AM on March 7, 2012


Like I said, you don't get anything that looks like the original waveform (the function Math.cos((x*4.9)-t*5);)

BECAUSE YOU ONLY USED TEN SAMPLES
posted by unSane at 8:20 AM on March 7, 2012


I'd love to be in a blind test, a great chamber quartet, the same piece recorded immediately previously and the master compared, a great compression, the music played live but output via speakers.

Before I had any interest in assembling a high performance sound system, I was at a lunch with Mark Levinson.

I asked him what could possibly justify the kind of $300,000 systems that his components feature in.

He said that for less, you could set up a system where a blindfolded listener could not tell the difference between a live flute player and the playback.

At the $300K level, that same thing could be done with 3 revving Harleys. That's where 24 bits might come in handy.
posted by StickyCarpet at 8:22 AM on March 7, 2012 [1 favorite]


Okay FINALLY I got the JSON encoding to work (manually) here is a direct link to the graph.

In blue, you have a waveform right below the nyquist limit if you were using 10 samples for every 2*Pi. In red, you have the result of performing Whittaker Shanon interpolation on those ten samples. You can see that the result is a pulsating signal at half the sampling frequency.
BECAUSE YOU ONLY USED TEN SAMPLES
BECAUSE PEOPLE SAID YOU COULD DO IT WITH TEN SAMPLES!!!! AND I WAS SAYING, YOU COULD NOT.
posted by delmoi at 8:25 AM on March 7, 2012


> Do you honestly think your ear has a dynamic range of 100dB at 20kHz?

No I do not. But I am dreaming of a world where my system has enough margin of error so that when something pathological comes out of it my first response is "hmm, maybe I should go to the doc and get my hearing checked again". My system now not only doesn't have any margin of error the sucker's got errors in it that were acceptable to some committee listening to Lady Gaga and who have no idea what Cecelia Bartoli is supposed to sound like.
posted by bukvich at 8:27 AM on March 7, 2012


BECAUSE PEOPLE SAID YOU COULD DO IT WITH TEN SAMPLES!!!! AND I WAS SAYING, YOU COULD NOT.

No, people said you didn't need infinite samples, and your counterexample was ten samples, which no-one claimed would work.
posted by unSane at 8:36 AM on March 7, 2012


which no-one claimed would work.
hmm
If you take that second roll of paper, and count the number of dots, you can see that there were originally 9999 peaks on the original roll, and that the wavelength of the original signal was 1.0001cm.

However, if you only have an ordinary sheet of paper, you would never be able to tell that the original wavelength had been 1.0001cm rather then 1cm.
Sorry, but this is just wrong. You would be able to reconstruct the original waveform (at least in the middle of the page, the edges would be high frequency discontinuities). You don't need to detect the peaks, or for that matter any peaks to reconstruct the waveform. The phase and amplitude of the samples are sufficient to reconstruct any waveform. Plug your samples into a fourier analysis tool and out would pop 1.0001 cm wavelength.
posted by delmoi at 8:43 AM on March 7, 2012


where is the claim that ten samples would work?
posted by unSane at 9:17 AM on March 7, 2012


Relevant.
posted by seanmpuckett at 9:43 AM on March 7, 2012


You don't need lots of samples to do a reconstruction because most of the sinc pulse energy is confined within one cycle. A few samples should be sufficient to give you reasonably good results. I'm not going to take the time to debug your code but all I can conclude is that you are doing it wrong. But keep at it. When you get it right you may finally be convinced.

Maybe this will help you: http://demonstrations.wolfram.com/SincInterpolationForSignalReconstruction/
posted by JackFlash at 10:07 AM on March 7, 2012




for(var i = 0; i < 10; i ++) sum += interpolate (i,x);

return sum/10;
I don't understand why you're dividing the sum by 10 here. Doesn't that make it an average, rather than a sum? Don't you want the sum? I think you want the sum.

Also, shouldn't you be simulating the range (-infinity) to (+infinity) with at least, say, -10 to +10, rather than with positive values only (0 to 9)? How can you get the reconstruction if you've left out half of the sinc curve?

My (possibly fuzzy) memory of the way real (early, multi-bit) DACs worked was that they did the summing over a range of something like 200 samples.
posted by Western Infidels at 10:14 AM on March 7, 2012 [1 favorite]


That page that JackFlash linked to above illustrates this perfectly.

This screenshot shows a 1Hz sine wave sampled at 2.1Hz., along with the interpolated reconstruction. Within nine samples the interpolation algorithm has provided a pretty good fit.

Sampled at 3Hz, with 13 samples, the fit is remarkably good.
posted by unSane at 11:17 AM on March 7, 2012 [2 favorites]


Do you honestly think your ear has a dynamic range of 100dB at 20kHz?

Aliasing! It doesn't matter what you can hear at 20kHz. Signal above the Nyquist rate that manages to come through the anti aliasing filter with enough magnitude to be audible could show up anywhere in the pass band (well, not literally anywhere, very predictably, but for practical purposes that doesn't matter).

Here is an article with a real life anti-aliasing filter for a CD audio system. Note that they make no attempt to show you the phase diagram. While I'm sure it meets specifications, I'm equally certain that it meets them exactly similarly to the way the stop band of that filter meets the -96dB requirement. That is, in a really messy way that might have really ugly effects on real world signals.

One thing this illustrates that escaped me until just a minute ago.. As you go to higher bit depth, you can't just keep the same sampling rate. The anti-aliasing filter required to meet the specs of the bit depth just keeps growing and growing, and becoming more and more degenerate (component tolerances become even more ridiculously tight, and ripple in the stop band continues even further into the very high frequencies, etc. etc.).
posted by Chuckles at 11:49 AM on March 7, 2012


I'm curious why they decided to use such a steep LPF instead of just raising the sampling frequency slightly and using a more relaxed (better behaved filter) over a wider frequency range. Presumably it's so you get the widest/flattest possible frequency response for a given sample rate, but I wonder how severe the trade-offs are.
posted by unSane at 12:09 PM on March 7, 2012


Don't people use 24/192 to record sound?

No. If this discussion is still about distribution, this is why it’s always funny to me, people are arguing about distributing something at a higher rate than it was recorded.

Some people record at 24/192. I, and many people I know, record at 24/44.1. The 24 bit is for the headroom, it makes it easier to work, not necessarily because it sounds better. I know some who work at 24/96. Some because they are unsure of themselves, some think it sounds better, most just say "eh, why not, maybe someone will want that format later". I personally don’t know anyone that works at 24/192 and it’s treated as a joke among most recording people I have ever talked to.

The "OMG I’ve got to have the bestest" thing is silly.
posted by bongo_x at 12:14 PM on March 7, 2012


This thread is pretty depressing. On one side you have basic facts of information theory that have been unquestioned for years and are used (successfully) in a wide range of fields. Plus you have an excellent explanation from a guy who developed Ogg, which required him to have a deep understanding of the mathematics and the human psychoacoustic system.

On the other you've basically got, "Hey, I've got an opinion and a Wikipedia link! Therefore this guy is a crank! You're all mindless ideologues!"

And we're surprised that a large percentage of the public doesn't accept global warming? All we're lacking now is accusations of an encoder conspiracy.
posted by bitmage at 12:16 PM on March 7, 2012 [9 favorites]


bitmage: also, almost everyone supporting the article is a musician or a music producer.
posted by empath at 12:34 PM on March 7, 2012


Ah, so you admit the conspiracy then!
posted by bitmage at 12:35 PM on March 7, 2012 [2 favorites]


Aliasing! It doesn't matter what you can hear at 20kHz. Signal above the Nyquist rate that manages to come through the anti aliasing filter with enough magnitude to be audible could show up anywhere in the pass band

Most audio A/D converters today use sigma-delta one-bit modulators running at several megahertz. They are effectively "over-sampling" which reduces the need for complicated anti-aliasing filters. This is because you are only worried about aliasing of noise that is above the megahertz sampling frequency. Everything between the baseband and the megahertz sampling rate is removed by a digital decimation filter following the modulator, so you only have to take care of megahertz noise that is above the megahertz sampling rate. For a sigma delta converter, a simple RC anti-aliasing input filter for megahertz noise is sufficient.

After the decimation filter, the data is down converted to 16-bit values at the familiar 44.1 KHz. Magic, eh?
posted by JackFlash at 12:38 PM on March 7, 2012


Aliasing! It doesn't matter what you can hear at 20kHz. Signal above the Nyquist rate that manages to come through the anti aliasing filter with enough magnitude to be audible could show up anywhere in the pass band (well, not literally anywhere, very predictably, but for practical purposes that doesn't matter).

Actually is does matter. If fs is 44.1 then your Nyquist frequency fs/2 is 22.05. That means any unfiltered signal at 22.05 + x will alias at 22.05 - x. You know, where your ear can't hear it for small values of x. And if your filter doesn't pass the 20kHz at 100% it doesn't really matter either, because your ears have limited dynamic range at those frequencies.
You can show me all the math and Bode plots and theorems you want, and you may even be technically correct, but you can't divorce this stuff from the physiology of the human ear and brain if you want to keep the conversation in the practical real world.
posted by rocket88 at 12:51 PM on March 7, 2012


As I pointed out above, most audio equipment today uses sigma-delta modulators running at several megahertz so the only aliasing you might have is for megahertz system noise at the input to the converter. This is easily removed with a single-pole RC filter (10 K-ohm and 100 pF, for example). This is because you now have a filter transition band of from about 100 Khz to several MHz. This is much easier to achieve than the sharp cut off between 20 KHz and 24 KHz needed for other types of converters. No need for complicated 13-pole brick wall filters.
posted by JackFlash at 1:27 PM on March 7, 2012


Not Magic, JackFlash, brilliant, I get it. It means that the dude who wrote the article is probably right about distribution of 16bit 44kHz, it probably is good enough. Only marginally, and assuming everybody in the production and distribution process got everything exactly right, but still good enough.
(The more practically minded audiophiles will even admit that--wooden volume knobs aside--CD based systems started to sound about as good as vinyl in the very late 90s. Different strengths and weaknesses, but overall similar quality)

On the other hand we have bongo_x saying he does 24bit/44.1kHz. From my comment above you can see why I think that may be a mistake? I mean, I suppose it is possible that oversampling combined with 'perfect digital filters' means that you can do 24bit/44.1kHz, but it makes me nervous. Perhaps you can comment on that? It seems to me that no filter is ever perfect (inherent property of time bandwidth duality which we really don't need to debate, right?), and that 24/96 is probably a much safer place to be when doing recordings.
posted by Chuckles at 1:44 PM on March 7, 2012


Quoting from That page that JackFlash linked to above:

Because the number of samples is limited, there will be some error in the reconstruction, even at a higher sampling frequency than the Nyquist frequency.

Hmm. Error. So the reconstructed signal doesn't exactly match the original. The magnitude of the error will be related to the frequencies we're reproducing and the sampling rate. Fascinating.

Imagine what such error looks like at 10 kHz. The errors are substantial.

Imagine what such error looks like when we increase the sampling rate. Interesting, the errors at 10 kHz decrease in amplitude substantially.

Thanks for the evidence and editing, Señor Jacque.
posted by sydnius at 1:51 PM on March 7, 2012


Imagine what such error looks like at 10 kHz. The errors are substantial.

No, they're not, except in your imagination. You've completely misunderstood the page. The number of samples refers to the absolute number of samples, not the sampling rate. As I pointed out upthread, a three minute track at 44.1 kHz has about 8 million samples. My graphs show that at a sampling rate of 3x your target frequency, you get a fairly good approximation of the wave form within 13 samples. 44.1 is more than 4x the 10kHz frequency you are discussing, and we are talking about 8 million samples.

Why is it so hard to believe that people have actually done the math on this?
posted by unSane at 2:00 PM on March 7, 2012 [2 favorites]


I suppose it is possible that oversampling combined with 'perfect digital filters' means that you can do 24bit/44.1kHz, but it makes me nervous.

It is correct that to get a true 24 bits with high signal to noise ratio you have to be more careful, using for example, a higher order sigma delta converter with more stages. But keep in mind that for most purposes, you are recording at 24 bits only to get more dynamic range, not to increase signal to noise. 16-bit signal to noise is just fine. But higher dynamic range is useful because you don't have to be so careful in setting your recording levels to avoid clipping. It gives you lots of headroom. You eventually want only 16 bits of dynamic range but you don't have to be so careful where you record that within the 24 bit dynamic range. When all is said and done, you are going to throw away the lower 8 bits (along with their noise) and distribute 16 bit data because that is all you need for quality playback. The final signal to noise is the same as if you had recorded at 16 bits originally. The 24-bit recording is just a convenience that makes the recording engineer's life easier by giving them more margin of error when setting record levels. You don't care that the lower bits might carry more noise because they get thrown away. So the strategy is to record at 24 bits and playback at 16.
posted by JackFlash at 2:28 PM on March 7, 2012


Why is it so hard to believe that people have actually done the math on this?

And moreover, that if what they were saying is true, it wouldn't just be a problem with playing back mp3s. Every major digital telecommunications technology depends on being able to perfectly recover signals from sampled waves.
posted by empath at 2:32 PM on March 7, 2012 [1 favorite]


unSane sez:

No, they're not, except in your imagination. You've completely misunderstood the page.

you get a fairly good approximation of the wave form within 13 samples.


Referring to your own diagram, I respectfully disagree. The errors are 25% of scale for your "fairly good approximation". If I don't understand the bouncing "absolute error in sinc interpolation" error graph, please feel free to interject with what the error graph denotes. Is it error or not? Is the scale really 25% of the signal for this case? Remember that this is three times more samples than the representation 10 kHz receives in CD-quality audio.

44 kHz does not accurately enough represent the human-perceptable audio signal to not introduce real, measurable error. This is indisputable. The magnitude and consequences of the error on perception are in full-fledged debate, which they have been for a while, and will continue to be.
posted by sydnius at 2:33 PM on March 7, 2012


jumpinjackflash has been summoned in askmetafilter.

Why is it so hard to believe that people have actually done the math on this?

I trust the engineers know what they are doing. What I distrust is Sony and Panasonic and HarmonKardon and BangOlufsun managers calculating the precise to the penny tradeoffs so that they don't lose any business selling medium fidelity merchandise with high fidelity labels and prices. Did you see Fight Club? The scene where the automobile guy explains "if we aren't going to get sued for this amount or more we don't do the recall"? I can't speak for anybody else but that is what drives my own skepticism and distrust when I read something like Monty's piece. I don't think there is anything wrong in what he wrote.

I still want 200kHz samples. I want a margin of safety. Just like Warren Buffet evaluating a share of stock. Well maybe 100kHz. I ain't buying the 44kHz is good enough for anybody argument yet.

If somebody wants to go into that askmetafilter question and recommend audio engineer textbooks I would probably buy one.
posted by bukvich at 2:35 PM on March 7, 2012


bukvich, there are vastly better place to spend money than on the sample rate, which has been discussed at length in this thread.
posted by flaterik at 2:40 PM on March 7, 2012


Remember that this is three times more samples than the representation 10 kHz receives in CD-quality audio.

No, it isn't. Please stop digging. You are yet again confusing sampling frequency for sample size. My example reconstructs a waveform from a 3-cycle chunk containing a dozen samples. The errors are large because there are only a dozen samples, not because the sampling frequency isn't great enough. The 8 million samples of a three minute song are effectively infinite in terms of the interpolation algorithm.
posted by unSane at 2:56 PM on March 7, 2012


Ok unSane favorited this comment in the askme thread.

The rec is for Mastering Audio by Bob Katz.

From p. 226 of my copy:

"I firmly believe that some minimal sample rate (perhaps 96 kHz) will be all that is necessary if PCM-converters are redesigned with psychoacoustically-correct filters (hopefully inexpensively). For the benefits of the myriads of consumers and professionals, we need to make a cost-analysis of the whole picture instead of racing towards bankruptcy."

I realize he isn't necessarily talking about the box on my shelf that I bought from Circuit City, but there is one of your experts advocating for 96kHz.
posted by bukvich at 3:14 PM on March 7, 2012


you get a fairly good approximation of the wave form within 13 samples.

Referring to your own diagram, I respectfully disagree. The errors are 25% of scale for your "fairly good approximation".


The reason there are errors are because in this approximation you are missing all the samples that come before and after. This gets into why the Shannon theorem talks about an infinite sequence of samples.

Look at that picture of the sinc function. Just look at the red line for now. When you reconstruct an waveform from samples, you overlay and sum up a bunch of these sinc pulses, one corresponding to each sample and with amplitude corresponding to the sample's magnitude. So here you see one sinc pulse corresponding to one sample. Then move left or right by 2*pi as shown on the top scale. There you would have another sinc pulse with perhaps a different amplitude corresponding to another sample. Then go to 4*pi on either side and add two more sinc pulses. Sum up all of these sinc pulses and you get the original waveform.

But notice that the sinc pulse propagates infinitely in time in both directions with decaying ripples. This is where the idea of infinite samples comes in because at each point in time, the original waveform is the sum of all the sinc pulses that ever occurred before and after.

But it practical terms notice that the farther away from the center of the pulse, the smaller the amplitude of the sinc pulse. This means that once you get more than 5 or 6 pulses away (10*pi or12*pi) their contribution to the sum at any particular point becomes vanishingly small. You can increase your re-creation to any arbitrary accuracy by just including more of the previous samples in your sum. In practical terms, once you get a few samples away, the differences become tiny.

So the error you are seeing in the simulation is because with just a few samples, you are missing part of the history that came before and their contribution to the sum of those samples. If you were to include more of those samples on either end, that is, use a wider window that includes more samples, then the errors would vanish. More samples doesn't mean a faster sample rate. It just means that you need to use a wider window in your calculations so that you include more distant pulses in your sum.
posted by JackFlash at 3:16 PM on March 7, 2012 [2 favorites]


Also, in the simulation, note that the errors are smaller in the middle and get big on both ends. That is because the samples in the middle have a lot of neighboring samples that contribute to a more accurate sum, while the samples on the ends have big errors because they are missing the contributions to the sum of all their neighbors to the left and right.
posted by JackFlash at 3:26 PM on March 7, 2012


I realize he isn't necessarily talking about the box on my shelf that I bought from Circuit City, but there is one of your experts advocating for 96kHz.

Arguments based on math, logic and physics are generally better than arguments based on authority. JackFlash, etc, aren't just quoting selectively from text books to prove a point.
posted by empath at 3:36 PM on March 7, 2012


empath: bitmage: also, almost everyone supporting the article is a musician or a music producer.

empath: Arguments based on math, logic and physics are generally better than arguments based on authority.

Hmmmmm.
posted by bukvich at 3:45 PM on March 7, 2012


Katz is talking about the sampling rate used in music production, which is quite a different matter, as audio can be slowed down and manipulated in all sorts of ways. He isn't talking about post-mastering consumer audio formats.
posted by unSane at 4:20 PM on March 7, 2012


This has been one of the most fascinating threads in recent metafilter memory. Depressing too, or at least revealing. Well, more confirmingly illustrative of typical behavior.
posted by OmieWise at 4:26 PM on March 7, 2012 [1 favorite]


I found this comment which has more from Bob Katz on 96KHz audio.

This led to the following (preliminary) conclusions:

1. A properly-designed 20kHz digital filter can be sonically invisible in a 96kHz sampled environment.

2. Experience and this experiment suggests that 44.1kHz sampling digital systems can sound much better simply by use of better digital filters. This includes all the filters in compact disc players, A/Ds, etc. The effects of cumulative filters must also be considered —
a situation similar to the familiar effects of group delay in successive bandpass limited analogue circuits.

3. 96kHz sampling systems do not sound better because of increased bandwidth. The ear does not use information above 20kHz to evaluate sound.

posted by TwoWordReview at 4:59 PM on March 7, 2012


I don't understand why you're dividing the sum by 10 here. Doesn't that make it an average, rather than a sum? Don't you want the sum? I think you want the sum.
Actually, that's due to a mistake I made. *sigh* Apparently I screwed up the sinc function at the last minute. I'd initially used the non-normalized version, and when I converted it to normalized form by using sin(πx)/πx instead of sin(πx)/(πx), which resulted in an incorrect function due to the order of operations. The result didn't fit on the graph so I just averaged it. :P. I probably should have paid closer attention.

Anyway, here are some corrected graphs. First, here's the source wave and the waveform generated by sampling at position four, and here is position seven. As you can see I'm using the normal sinc function and calculating the value of sinc for the sample over the entire graph.
here is a corrected graph for the sum for each of the ten samples.
Also, shouldn't you be simulating the range (-infinity) to (+infinity) with at least, say, -10 to +10, rather than with positive values only (0 to 9)? How can you get the reconstruction if you've left out half of the sinc curve?
The positive values are just the sample numbers. The negative sample values would only have a large impact 'offscreen'. here is the graph using sample values from –10 to +10. The graph looks the same, just runs more slowly.

The result is actually pretty interesting. It does match the curve better this time, but you still have an obvious pulsing effect, and then a fairly obvious 'step' at every pulse.
--
This screenshot shows a 1Hz sine wave sampled at 2.1Hz., along with the interpolated reconstruction. Within nine samples the interpolation algorithm has provided a pretty good fit.
As you can see there is still a pretty big change in the amplitude of the reconstructed wave. From a bit less then 5 at the start of the graph to around 7 or so by the end.

So you're graph illustrated exactly what I've been saying: That you will have a pulsation effect.
where is the claim that ten samples would work?
...
As I pointed out upthread, a three minute track at 44.1 kHz has about 8 million samples. My graphs show that at a sampling rate of 3x your target frequency, you get a fairly good approximation of the wave form within 13 samples. 44.1 is more than 4x the 10kHz frequency you are discussing, and we are talking about 8 million samples.
-- unSane
Uh, no. This entire thread I have been talking about a small number of samples, because people would say it would work with a small number of samples. If you think that having a large number of samples will make it work then you agree with what I have been trying to argue this entire thread, right from the beginning In fact, I used 10,000 samples a specific example that would work.
Here's one of my early comments:
If you take that second roll of paper, and count the number of dots, you can see that there were originally 9999 peaks on the original roll, and that the wavelength of the original signal was 1.0001cm.

However, if you only have an ordinary sheet of paper, you would never be able to tell that the original wavelength had been 1.0001cm rather then 1cm.
The response was:
Sorry, but this is just wrong. You would be able to reconstruct the original waveform (at least in the middle of the page, the edges would be high frequency discontinuities). You don't need to detect the peaks, or for that matter any peaks to reconstruct the waveform. The phase and amplitude of the samples are sufficient to reconstruct any waveform. Plug your samples into a fourier analysis tool and out would pop 1.0001 cm wavelength.
One "normal" sheet of paper (say, 8.5 x 11" ) is 28cm. So we would be talking about 28 samples here. That's it. that's what I have been saying is not true. That's why I used 10 samples as an example. instead of centimeters, which would have made it about ten samples per page.

Again, I don't know how many times I can state this: I have been saying this whole time that using a handful (10 or so) samples won't get you the exact result.

And, instead of people saying "well, of course it won't work with 10 samples" a bunch of people are saying, well, the impulse function will solve all the problems and that the problem with the graphs I was showing at the time were that they were just 'connecting the dots' rather then using the sinc function.

Then, I re-do the graphs using the sinc function, and you say I'm only using 10 samples and that's why it's not working. But, That is what I have been saying this entire time and what people have been saying was not true.

As the claim specifically was that 1cm waves on an 'ordinary piece of paper' would work. With an 8.5" x 11" you could only have 27.94 waves, that's around the same order of magnitude as 10.

And after I posted that, no one said (until you, way down in the thread) that the problem was too few samples. the entire point I was making was that it was too few samples. I got a bunch of comments saying that using the impulse.
This thread is pretty depressing. On one side you have basic facts of information theory that have been unquestioned for years and are used (successfully) in a wide range of fields. Plus you have an excellent explanation from a guy who developed Ogg, which required him to have a deep understanding of the mathematics and the human psychoacoustic system.

On the other you've basically got, "Hey, I've got an opinion and a Wikipedia link! Therefore this guy is a crank! You're all mindless ideologues!"
I'm not "questioning information theory" I'm questioning whether or not you can reconstruct a curve with just a few samples. Critically, the unquestioned basic facts of information theory do not say that you can. What's mind boggling is that people seem to be getting upset at the idea that the theory says what it actually says, when it comes to finite sample sets, particularly small ones. It's a reasonable approximation in most cases, but not the specific case I have been talking about this entire thread.

Then, when people actually figure out what I've been saying, they... seem to get upset about and claim that what I was saying was obviously true, and also I'm wrong, so I clearly must have been saying something else...? Or something? It's not exactly clear.

All I know is that now people are criticizing me for the exact opposite of what I said from the very beginning

(I also said right from the beginning i didn't think it would have a noticeable impact on what anyone would actually hear at those frequencies, but there would be an effect)

Also, someone else was calling people mindless ideologues, not me.
posted by delmoi at 5:10 PM on March 7, 2012


This entire thread I have been talking about a small number of samples, because people would say it would work with a small number of samples

We were talking about a finite number of samples, not a small number of samples. A small number of samples never comes up in the real world, so why would that even be part of the conversation?
posted by empath at 5:21 PM on March 7, 2012


We were talking about a finite number of samples, not a small number of samples. A small number of samples never comes up in the real world, so why would that even be part of the conversation?
empath: jackFlash literally said 1cm waves on a sheet of paper would be recoverable, and literally said that I was wrong when I said that while it would be true for a 100m long roll of paper, it would not be true for a single sheet of ordinary paper.

Then, when I tried to show that it wouldn't work with just ten samples, I got a bunch of comment saying the problem was the lack of the sinc function, that I was just 'connecting the dots' rather then the lack of samples, which is what I said the problem actually was.

I was also saying that if you just played PCM audio without doing any filtering, you would hear the pulsation effect, rather then hearing the correct pure tone. On the other hand, if you do filter out pulsation effects that are there, you'd be removing pulses if they were 'supposed' to be there.

I didn't say that this would have much of a noticeable effect at 44.1kHz, but rather simply that the effect would be there. I said it would have a noticeable effect at you were sampling at a really low rate, like 500Hz or something like that, because I'm assuming that your ears probably can't tell the difference between really high frequencies that are one or two musical notes apart.
posted by delmoi at 5:30 PM on March 7, 2012


Probably because an ordinary sheet of paper would have about 20 dots on. It's a silly cross to die on, but it's the one presented.
posted by seanmpuckett at 5:31 PM on March 7, 2012


delmoi: Your simulation still does not seem correct. If you set it to 10 samples, you still get a series of line segments, which is wrong. That cannot be the case if you are using the sinc function for your pulses, which are smooth. I'm guessing that you are computing the sum of the sinc for each sample only at each sample point.

Instead, what you should be doing is computing the sum of all sinc pulses at each pixel on screen (every value of x) simultaneously so that you have smooth curves, even when using only 10 samples.
posted by JackFlash at 5:37 PM on March 7, 2012


delmoi: Your simulation still does not seem correct. If you set it to 10 samples,
Ah, sorry... you don't need to set it to 10 samples now. Leave it at 300. The graph is a graph of the mathematical function Σ(sample(n) * sinc((t - nT)/T)) from 0 to 10, which is the Whittaker–Shannon interpolation formula, applied to 10 evenly spaced samples of the input function.

In other words, I'm creating 'virtual' samples and outputting the 'true' curve of interpolating those samples. Then, the graph system itself samples that n times in order to display it on the graph. The default of 2 pixels per sample (300 per screen) will show you the correct curve.

If you set ten samples in graph settings, you'll still be seeing the same function, but instead of seeing curves, you'll just see straight lines.
posted by delmoi at 5:58 PM on March 7, 2012


I'd like to thank JackFlash for his reasoned analysis. I've learned something today.

Recanting all my previous statements, 44 kHz does fully recreate the full spectrum of sound, higher sampling frequencies are not needed. Nyquist showed this a long time ago.

I've always adhered to the naive interpretations that outsiders to DSP have about PCM-encoded audio. I don't think we fully appreciate the depth of the understanding of the theory that went into the choices made. I also think that the public at large doesn't really understand the math required to fully analyze the signal. It's not as simple as sampling points of sound pressure.

JackFlash has fully explained the basis for the statement that the original source material can be faithfully reproduced with CD sampling rates. I get it now. Thanks for your patience and tenacity. Neil Young is wrong about the bitrates. I'm still going to enjoy "Cinnamon Girl". K?
posted by sydnius at 6:18 PM on March 7, 2012 [5 favorites]


Your simulation still doesn't make sense. Why is your output waveform pulsing up and down in real time. Your output waveform should be moving left to right, just like your input waveform. Once you compute an output value for a time point, that value should be fixed and then pan across the screen. You seem to be doing some weird thing like re-computing output values for outputs that have already occurred.

Take a look at the wolfram simulation to get a better idea.
posted by JackFlash at 6:19 PM on March 7, 2012


Oh by the way, if you tweak this to use values that aren't as close to the niquist limit (and you sample a little bit beyond the edge of the graph) you can see that it does actually work really well. So our 'virtual' sampling rate is 10 samples for every 2π, if we use a frequency of two per cycles screen it's a really close match, even at 4.5 it's still a good match. At 4.9 (what I linked too is more apparent), and then at 4.99 the pulse effect is much stronger. So anyway, it is definitely true that the interpolation formula works better then I expected it too
Your simulation still doesn't make sense. Why is your output waveform pulsing up and down in real time. Your output waveform should be moving left to right, just like your input waveform.
It's because the interpolation formula doesn't work when you're close to the niquist limit. Which is the point I'm trying to make. It works as you move farther away from it.
posted by delmoi at 6:23 PM on March 7, 2012


(That isn't to say you couldn't filter out the pulse effect, I'm sure you could, but you would need a more complicated algorithm)
posted by delmoi at 6:30 PM on March 7, 2012


It's because the interpolation formula doesn't work when you're close to the niquist limit.

This is Just. Plain. Wrong.

As you approach the Nyquist threshold you require a larger number of samples to reconstruct the waveform to within a particular limit of error. But as I have said over and over and over and over again, in real world audio applications the number of available sample is vastly larger than that required for accurate reconstruction of the waveform.
posted by unSane at 6:40 PM on March 7, 2012


I meant the formula doesn't work with the given number of samples.
posted by delmoi at 7:05 PM on March 7, 2012


But yeah, I guess if you had a thousand or so samples, it would work fine. Interesting.

(And... also somewhat embarrassing :P)
posted by delmoi at 7:10 PM on March 7, 2012


delmoi: The positive values are just the sample numbers. The negative sample values would only have a large impact 'offscreen'. here is the graph using sample values from –10 to +10. The graph looks the same, just runs more slowly.
I think it's great that you've applied so much effort to this, and the last thing I want to do is to offend you over any of it. I'm not an expert on these things and I'm trying to take this whole discussion as a chance to learn something.

However, I am pretty sure you're mistaken about this specific point. The negative displacement sample numbers (from past samples) absolutely do have an impact on the current output of the interpolation function. Those samples are not "offscreen" in any sense. Take a look at that interpolation formula again; the sample range is from -inf to +inf; even if we're pushing to see how few samples we can get away with, it's not reasonable to think we can eliminate the symmetry of the sum operation and expect accurate results.

That's actually borne out by your animated graphs; the 0-to-9 graph and the -10-to-+10 graph are not the same, the latter is much better. Watch how small the amplitude gets in the 0-to-9 graph, then watch for the same thing on the other.

The results seem pretty good with an interpolation range of -30 to +31 (for symmetry while using the less-than operator). Although they're less good on the right-hand-side than they are on the left-hand-side, which leads me to suppose I don't understand something about how the chart is being computed or displayed; once an output value is calculated, it shouldn't change.
posted by Western Infidels at 7:19 PM on March 7, 2012


delmoi: consider the following signal:
an infinite number of zeros, followed by 10 samples pasted in from a sine wave just under nyquist frequency, followed ny an infunite number of zeros

The bandwidth where you transition from 0 to <nyquist is very broad. In fact if you don't smooth the transition with a filter, it will contain frequencies over nyquist. Same with the transition from <nyquist to 0. the errors you get when you analyze those ten samples are the same ones caused by that cut and paste discontinuity.
posted by idiopath at 7:22 PM on March 7, 2012


*analyze those ten samples in isolation

tl;dr: your signal was not properly bandlimited

Also consider the dirac delta: an infinite number of zeros followed by a lone sample at max amplitude followed by infinite zeros. It literaly contains every single frequency below nyquist at unity amplitude. It is the multiiplicitive identity for signal vectors.
posted by idiopath at 7:27 PM on March 7, 2012


It's because the interpolation formula doesn't work when you're close to the niquist limit. Which is the point I'm trying to make. It works as you move farther away from it.

I think you are close but not quite there yet. You calculate an output value, plot it on the screen, then that output point should pan horizontally, just like your input. Points can't move vertically. That makes no sense. What you should have is an output signal that as it move horizontally across, varies in amplitude. What is happening is that since you are very near the Nyquist limit, your sample points move in and out of phase with the zero crossing points. When all of your samples are at zero crossing, the amplitude is a minimum. When all of your samples are at the peak, the amplitude is a maximum. Since you are sampling very close to the limit, you move back and forth between these extremes.


However, you seem to be conflating two different phenomena. The first is that, depending where your samples fall, you can't sample at a rate extremely closely to the Nyquist limit. In particular, if you sample only at or near the zero crossings, your wave will disappear. That is no surprise. It is a clearly stated violation of the Shannon theorem.

The second phenomena is that the number of samples you use in a reconstruction matters to some extent. It only takes a few samples to make a pretty good approximation if you sample at a proper rate, but this is a separate issue from the Nyquist limit.

If you combine these two phenomena, sample at a rate near the limit and also only use a few samples, you are going to get weird stuff because you are violating two rules simultaneously. If you avoid these two issues you get perfect, or nearly perfect, reconstruction. 16-bit, 44.1 KHz audio does not violate either of these so should not be an issue.
posted by JackFlash at 8:21 PM on March 7, 2012 [1 favorite]


I think it's great that you've applied so much effort to this, and the last thing I want to do is to offend you over any of it. I'm not an expert on these things and I'm trying to take this whole discussion as a chance to learn something.
Nope, you're right I was totally wrong. Sure, each individual pulse won't have much of an effect on ones that are far away, but clearly when you sum over a large range, the far away individual pulses do correct the local errors. Which is really fascinating.

However, my point about it not working just a few samples was still correct :) I always thought that if you had a large number of samples you could reconstruct the wave easily, but apparently the interpolation function actually does that. Which is really cool.

---

However, one question I do still have (and I don't mean to annoy you guys) is, what happens if there are supposed to be pulses in the signal?

Like for example, if I use cos(x*4.9)-t*5), and sum from -50 to 50, it works really well. However, if I use cos(x*4.9)-t*5)*cos((x*0.2)-t*0.2) I end up back with a a much faster pulse rate in the reconstructed signal.

If I increase the width of my modulation pulse to 0.05 cycles/screen rather then 0.2 . the error goes away again, but then if I adjust the modulated main frequency to 4.97, the slow increase and decrease of the main signal causes a much higher frequency pulse to appear again.

It doesn't seem to be caused by a lack of samples either, going to -100,+200 or even -200 to +200 still shows the same effect. (And the animation speed is really slow, since it has to run through the functions n times for every sample, 300 times per frame, so 6000 computations)

So yeah - it does seem like there could be some artifacts due to the fact that the signals in the musical recording aren't constant for the whole song, but rather rising and falling.

Anyway, something to think about. As I said, I doubt it's anything you could hear anything at these frequencies, but it might cause effects if you were sampling at a much lower rate. (there's also the issue of what happened if the signal is frequency modulated)
I think you are close but not quite there yet. You calculate an output value, plot it on the screen, then that output point should pan horizontally, just like your input. Points can't move vertically. That makes no sense.
Er, Maybe I wasn't totally clear on what's happening -- each frame of animation is totally independent. I'm adjusting the phase of the source function by 2*π/600 every frame. The original idea was like you were looking at an oscilloscope image that's been tuned to 2.0033*pi.

However, with this sample reconstruction thing you're looking at a local feature (and actually you can zoom out and see how the wave is only reconstructed over the area that you're actually sampling, which is again pretty interesting (IMO)) so what the time component is meant to 'represent' here is just different possible phases for the source function, so you can see how various input functions would be affected depending on their phase relative to the sampling frequency.

(I should probably have explained that better earlier)

That said, now that I think about it, human hearing can go up to 20kHz, not but the Nyquist limit for 44.1kHz is 22kHz That means that any sounds that do get close the nyquist limit are probably not going to be audible anyway. So, even if there were odd effects as you got close to the nyquist limit, those effects probably wouldn't be audible anyway.
posted by delmoi at 8:58 PM on March 7, 2012


However, one question I do still have (and I don't mean to annoy you guys) is, what happens if there are supposed to be pulses in the signal?

It really doesn't matter how the signal is generated, you can always do a fourier transform on it. Even if it was made with an FM synth.
posted by empath at 9:07 PM on March 7, 2012


Like for example, if I use cos(x*4.9)-t*5), and sum from -50 to 50, it works really well. However, if I use cos(x*4.9)-t*5)*cos((x*0.2)-t*0.2) I end up back with a a much faster pulse rate in the reconstructed signal.

You are probably getting aliasing because the product of the two cosines has frequency components above the Nyquist limit. You need to LPF the function before you sample it.
posted by unSane at 9:13 PM on March 7, 2012 [1 favorite]


unSane is 100% correct (yet again). The nyquist limit does not only apply to whole wavecycles, even a partial cycle with a slope steeper than a sine function at nyquist can cause aliasing (this is another way to understand digital clipping or the artifacts created by edits without crossfades: at some region the slope was steeper than nyquist, and the aliasing is the folding over of those frequencies which made up that too-steep slope).
posted by idiopath at 9:24 PM on March 7, 2012


Yep, remember your trig product identity. You get two new waves, one at the sum and one at the difference of the two original frequencies. This is the heterodyne principle for upshifting and downshifting frequencies. You are going to be out of band unless you sample at twice the sum of the two frequencies.

Pulses, frequency modulation, any arbitrary waveform you create is okay as long as it doesn't contain any frequency content that is higher than the Nyquist limit. That is why there is a low pass anti-aliasing filter in front of the original A/D converter -- to make sure there is no high frequency content in the input waveform that violates Nyquist. If it is not in the input, then it also won't be in the output samples when you reconstruct the waveform. Sharp edges like square waves or sawtooths by definition contain infinite bandwidth so can only be approximated by high sample rates.
posted by JackFlash at 9:44 PM on March 7, 2012


Also, thinking about this a bit more: what about the effect of quantization and buffer sizes? If you were calculating all of this using 16 bit integers, the effect of the sinc sample more then 216 bits away would be less then 1/(216) which is too small of an effect to be encoded. You could get around this by doing the interpolation at a higher bit dept, though. But the other thing is, in order to reconstruct the signal, you need a decent number of samples, including samples from upcoming points in time. But in order to do that, you would need to to buffer incoming samples and then include them in the interpolation.

(Interestingly, this would not apply to MP3 encoded data, since you are using pre-calculated frequency levels, and you have access to the entire buffer as you're encoding)

Anyway, yeah I had no idea that audio DACs were so complex. I always just assumed they just set the level to whatever the input was, and that was it (That's certainly what you want for video, you don't want previous pixels to have any impact on future pixels).

But it seems like if you had a much higher sample rate, you could just skip a lot of this and use much less complex hardware. You wouldn't need to pre-buffer anything. And the quality of the output wouldn't be as dependent on the output hardware (if that's even an issue). These 1-bit, 1mhz sample systems seem like they would simplify everything a great deal.
posted by delmoi at 5:33 AM on March 8, 2012


What an amazing thread. delmoi - I think you owe these guys some beers! Pretty sure you just got at least 2 semester's worth of free education :p
posted by lohmannn at 6:08 AM on March 8, 2012


What an amazing thread. delmoi - I think you owe these guys some beers! Pretty sure you just got at least 2 semester's worth of free education :p
Yeah I definitely learned a ton here. :)
posted by delmoi at 6:15 AM on March 8, 2012 [1 favorite]


That's certainly what you want for video, you don't want previous pixels to have any impact on future pixels

Video uses the same process for sampling light frequency.
posted by empath at 6:24 AM on March 8, 2012


(That's certainly what you want for video, you don't want previous pixels to have any impact on future pixels).

This encoding isn't done in a sequential manner. The waveform is treated as a static object to be analysed all at once. So, yeah, it works exactly the same way in video.

Think of a picture where you do a fourier transform of the RGB components of each line of pixels, then use this to reconstruct the picture. You obviously DO want pixels to both left and right to have an effect on the pixel under consideration. (This is fundamentally what JPEG does, except on smaller blocks of pixels and throwing away frequencies that aren't important). Sound is exactly the same.
posted by unSane at 6:33 AM on March 8, 2012


Well, if we get into the real details, the DAC doesn't actually sum up sinc pulses as in the mathematical reconstruction because these pulses are not easily generated in the real world. The DAC does actually output the classic stairsteps you see in typical DAC diagrams. This is known as zero order hold -- you output a sample value and hold it until the next sample.

Mathematically, the zero order hold is the convolution of a dirac impulse with a rectangular pulse. In the frequency domain the transform of a rectangular pulse is the sinc function so the effect of the zero order hold -- the stairsteps -- is to multiply the spectrum of the input waveform by the sinc function.

So to recover the original waveform you take the stairsteps output of the DAC and pass it through a low pass filter that removes the frequencies above the Nyquist limit and applies the inverse sinc function in the frequency domain.

This output filter has the same extreme brick wall Nyquist requirements as the input filter of an A/D. To get around this you use the same sigma-delta over-sampling trick used for the A/D, but in reverse. So now instead of stairsteps being 44.1 KHz apart, you multiply by 16 or 64 so that each step is broken down into tinier little steps to make a smoother curve. This means you can now use a much simpler RC filter at the output to make the original smooth waveform.

So there is no storage of data in the DAC. There is no mathematical summing of samples. Each sample is simply output and thrown away. So why doesn't this violate Shannon's infinite summation? Each new sample moves the output some small delta away from its previous position so essentially the previous position contains a history of all previous samples and you just add to that incrementally. You don't have knowledge of any future samples and that results in what is known as interpolation error, but it turns out that it is inconsequential and causes a miniscule amount of phase delay.

So it turns out that all of the simulations you did aren't actually what occurs in the real world DAC. The DAC is relatively simple. It is just a sample and hold followed by a low pass filter.
posted by JackFlash at 8:49 AM on March 8, 2012 [2 favorites]


JackFlash: I could be misremembering, but I have the vague concept that the impulse response of a low pass filter looks very similar to a sinc function - meaning that running through an lpf is pretty much the same as convolving with a sinc function. Like all filters it has a phase shift, and that phase shift allows samples to be affected by values a short distance in the future.
posted by idiopath at 2:02 PM on March 8, 2012


OK I looked it up and now I can be more specific. A true flat-passband brick wall lowpass filter would have a sinc function as its impulse response, and is called a sinc filter. The better your brick wall lowpass filter, the closer its impulse response is to being a sinc function (NB convolving with a filter's impulse response is the same as applying said filter, since the dirac delta is the identity function for convolution).
posted by idiopath at 6:02 PM on March 8, 2012


Think of a picture where you do a fourier transform of the RGB components of each line of pixels, then use this to reconstruct the picture. You obviously DO want pixels to both left and right to have an effect on the pixel under consideration. (This is fundamentally what JPEG does, except on smaller blocks of pixels and throwing away frequencies that aren't important).
I'm talking about the DAC used to convert video memory to a video signal. use If you had an analog CRT display, it seems like you would want the signal sent to use square waves from the source in order to generate sharp lines around a pixel. It seems like, if you used a sinc function you would have half the level of pixel n appear in pixel n-1 and n+1. If you had a halftone grid, the grid would be ghosted before and after. (of course, these days everyone uses a digital display anyway, so it's a non issue)

I know how JPEG works, but that's not how other image formats (i.e. PNG, GIF, or uncompressed image formats work), and that's different from how a frame buffer gets converted to a video signal
So there is no storage of data in the DAC. There is no mathematical summing of samples. Each sample is simply output and thrown away. So why doesn't this violate Shannon's infinite summation? Each new sample moves the output some small delta away from its previous position so essentially the previous position contains a history of all previous samples and you just add to that incrementally. You don't have knowledge of any future samples and that results in what is known as interpolation error, but it turns out that it is inconsequential and causes a miniscule amount of phase delay.
Okay, that's what I though in the beginning. I figured different DAC hardware worked in different ways. It would be possible that they would use the sinc function, but old hardware would certainly have used something like step and hold. That also means the quality of the playback could be dependent on the quality of the DAC

(And, interestingly, that you could potentially get better quality by converting an mp3 directly into an analog signal, since the MP3 encoder can get a much wider view of the signal when it encodes)

I also figured that, while implementing sinc digitally would take up a lot of transistors, it would probably be possible to do sinc using analog components and, like, some kind of feedback, where you take the current signal and add ½ the prior level 1/3rd the one before that and so on. (because the sinc function includes a 1/x term) But you would still need some kind of 'memory' to store each prior sample level. If you just divided the entire prior signal by half, you would be using 1/2 the prior signal, plus 1/4th the signal before that, and 1/8th the signal before that and so on.

That's not to say it doesn't work out to being a good approximation, and since with 44.1khz you have the niquist limit 10% above the audible range anyway I'm sure it's a non issue. But it's interesting to think about.
posted by delmoi at 6:46 PM on March 8, 2012


But you would still need some kind of 'memory' to store each prior sample level.

it's called a capacitor, and it is the most important part of an analog lowpass.

As I mention above, the impulse response of a perfect brickwall filter would be a sinc function.

Notice I did not say "digital". The impulse response of a perfect analog brick wall filter would be a sinc function, and thus applying a perfect analog brick wall would be exactly the same as convolving with a sinc function.
posted by idiopath at 7:15 PM on March 8, 2012


and, delmoi, since you were big enough to admit when you were wrong, I should mention that this is where you show him to be wrong - there is a sort of storage in the dac, and it is implemented in analog as the capacitors in the lowpass filter. Mind you the roloff of the signal's amplitude and the storage are mixed into one property of the capacitor (its discharge time).
posted by idiopath at 8:07 PM on March 8, 2012


and, thank you delmoi for persisting with your argument in good faith, because attempting to address it helped me significantly in solidifying my own knowledge of DSP
posted by idiopath at 8:24 PM on March 8, 2012 [1 favorite]


Ok, this is probably ill advised since I'm way out of my depth here when the discussion digs into the mathiness of information theory and it seems to be winding down but I'm going to jump in.

First I want to shoot down some low hanging fruit. I've seen several arguments here that amounted to "the consumer can't currently make maximum use of the technology so it should not exist." That's unarguably false. The limited capabilities of my iPod should not have anything to do with the higher quality format offerings. It's apples (ha) to oranges. I mean, many popular devices can't even play FLAC. Should they stop offering that?

To those of you who have experience working with 192khz: is it possible that you are saying that you have sat down and recorded something with a wide dynamic range, say a Saxophone or other brass instrument, done an ABX test and that you did not hear a difference between something recorded at 192khz and something at 44.1khz? I find that extremely hard to believe. I am just a layperson without trained ears myself but I personally have experienced the difference between the two to be at minimum, apparent and actually fairly striking.

The other argument I'd like to attack is that 192 khz is overkill, therefore 44.1 is fine. That isn't logical. There's a lot of ground in between the two to be considered. It's likely that 88.2 or 96 kHz provide all of the discernible improvement that I noticed when comparing 192 to 44.1. I don't know and I'd be curious to hear any opinions from those who have tested 192 against 88/96.

I think that most everyone would agree that there is clearly reasons to record/mix/process at higher than 44.1khz. And it's clear that lots of plugins convert to 192 for processing, then down to whatever the session sample rate is so observably there is likely a reason for that. And many mastering studios/equipment up convert to do their processing at 96 or 192 before finally converting down to 44.1 for CD release. Would there be extra effort put into upsampling if 192=44.1? And does the serial upsampling and downsampling that takes place as a part of the mixing and mastering process have a cumulative effect and will it become discernible? Again, I am super interested in any experiences with this. Has anyone taken a 44.1 kHz file, ran it through several conversion cycles using common studio equipment, then subtracted it from the original and observed what was left?

So does the consumer need to be offered that 192khz file? Well, that's obviously complicated. It's much easier to say that raising the highest offering to 88.2 or 96 kHz is enough. But since we're talking about setting standards for something that is both variable and subjective it seems like it is clearly a discussion worth having.
posted by tinamonster at 9:03 PM on March 8, 2012


there is a sort of storage in the dac, and it is implemented in analog as the capacitors in the lowpass filter

Right. I don't know that much about analog electronics, but I know it should be possible to have some kind of 'storage', but the question in my mind is if you are trying to apply a 'true' sum of sinc functions, you would need to store some value for each sample that you want to include in the sum? The trick is, the value of each prior sample's sinc function is scaled by a linear amount based on the distance (since the function is sin(x) * (1/x)) That means the ratio between the sinc value for two samples changes as their distance from the current sample increases.

So, (if I'm doing my math right - which certainly might not be the case!) the difference between the 'contribution' of sinc function for sample n at time n*(sample period) and sample n-1 is 1/2. But the distance between the sinc function of samples n-2 and n-3 would be 2/3 and so on. So in order to re-compute that, it seems like you would need to somehow scale those 'contributions' separately. You could add as much parallelism as you wanted, but ultimately you'd need to do at least one addition for each sample for each 'subsample' of output.

If you did a logarithmic approximation sinc function, where each contribution was scaled by an amount that stayed constant compared to it's neighbors, you would only need to store the previous level. Obviously in 'real' analog low-pass filters, there is some kind of approximation going on.

That isn't to say there would ever be any audible artifacts at 44.1khz. I'm assuming there wouldn't be. But it also seems very dependent on the hardware, whereas if you had a higher sample rate, difference is the DAC would be less likely to be audible.
posted by delmoi at 9:12 PM on March 8, 2012


To those of you who have experience working with 192khz: is it possible that you are saying that you have sat down and recorded something with a wide dynamic range, say a Saxophone or other brass instrument, done an ABX test and that you did not hear a difference between something recorded at 192khz and something at 44.1khz? I find that extremely hard to believe. I am just a layperson without trained ears myself but I personally have experienced the difference between the two to be at minimum, apparent and actually fairly striking.

I'm afraid you really are out of your depth here: you're confusing sampling rate with bit depth. Sampling rate has no impact on dynamic range. Bit depth is directly related to dynamic range.

If you're actually asking about the difference between 24 bit and 16 bit recording, then in order to hear the difference between them you would need world class equipment, a world class listening environment, and world class ears to hear the difference. It's basically on the level of being able to hear the quietest noise you can imagine, like the noise of yourself blinking. Anyway, some pro engineers do claim to be able to hear a difference; others don't.

If you heard a 'striking' difference either you were fooling yourself or something different was going on: the difference is right at the limits of human perception.

The advantage of 24 bit is that it has much more headroom and latitude for error while audio signals are bouncing around in a workstation, and rounding errors are much less of an issue.

As I said upthread, your listening environment has a an effect greater by many, many orders of magnitude on what you hear than any of the other things we have discussed in this thread.
posted by unSane at 9:32 PM on March 8, 2012


(I say this as someone who records everything in 24-bit and only downsamples to 16-bit in mastering. So long as I'm using dither, I can't hear any difference. You also have to bear in mind that there are generally many many other ugly things going on in the audio chain that you never know about and would be horrified about -- EQ which buggers up the phase, limiters, exciters, distortion, dumping stuff out to tape and back, weird phase monkeying to increase the perceived stereo field, saturation plugins or all sorts of analog weirdness from the console. All of this stuff is massively more significant than all of the other stuff we are discussing)
posted by unSane at 9:38 PM on March 8, 2012 [1 favorite]


delmoi: there is a both a positive feedback path and a negative feedback path in the analog filter circuit, and if you do the math, I have been told that the cumulative cancellations / summations that are applied to the signal over time get you the desired response. Think of it this way: if a single click is the input, and the output looks something like a sinc function, than if the filter is linear, the output will be the overlap of many copies of that function over time, following the curve of the input. If it isn't linear, then it isn't a brick wall lowpass.
posted by idiopath at 9:44 PM on March 8, 2012


I've got on final point for those who still suspect they're being cheated of true audio quality by the 16bit/44.1 kHz standard.

If you look at video reproduction, there's been a steady march of progress from VHS>DVD>BLURAY>3D to whatever comes next.

The reason for this is that hardware manufacturers (and Sony in particular, who are vertically integrated and use their movie productions to sell hardware) make a shitload of money by getting people to upgrade their equipment and movie collections.

The hardware manufacturers would LOVE LOVE LOVE to sell you some new shit that made your music sound better. The labels would also LOVE LOVE LOVE to have you pay extra to download something which sounded better. (They tried to do this with SACD of course).

The problem is that the quality difference just isn't there. They can't get people to upgrade the hardware or their media because people can't hear the difference.

The audiophile community has gotten a bad rap among the production community for exactly the same reason: the snake oil factor is out of this world precisely because the sound you're already hearing is really bloody good.

The deficiencies you hear, such as they are, are to do with loudness wars and compression algorithms and the fact that most music is mastered for low-to-medium-end systems, not for esoteric hifis.

Anyway, I'll say it again: if the labels and hardware guys felt there was *any* audible difference that they could leverage to sell a new format, they'd be all over it like a cheap suit.
posted by unSane at 9:58 PM on March 8, 2012 [1 favorite]


Ok, a small correction: I should have specifically said 24/192 vs. 16/44.1 since that's what the discussion is about and that's what we're all discussing.

If you heard a 'striking' difference either you were fooling yourself or something different was going on: the difference is right at the limits of human perception.

So this is exactly where my curiosity centers. I can understand that mathematically things work out to the imperceptible to the ear results you describe. But clearly something is going on that makes the 24/192 recording sound better than the 16/44.1. Could it be the converters' "sweet spot" or something bad that is happening in the downsampling process? Something unscientific?
posted by tinamonster at 10:03 PM on March 8, 2012


From Wikipedia on SACD, there's this which basically bears out what I'm saying. Bear in mind that SACD is roughly equivalent to 20 bit / 100 kHz PCM although the comparison is flawed because SACD uses a different process entirely.
In the audiophile community, the sound from the SACD format is thought to be significantly better than older format Red Book CD recordings.[37] However, in September 2007 the Audio Engineering Society published the results of a year-long trial in which a range of subjects including professional recording engineers were asked to discern the difference between SACD and compact disc audio (44.1 kHz/16 bit) under double blind test conditions. Out of 554 trials, there were 276 correct answers, a 49.8% success rate corresponding almost exactly to the 50% that would have been expected by chance guessing alone.[38]

posted by unSane at 10:04 PM on March 8, 2012


Now that I think on it more, I think I can clarify that example. A digital filter can be represented in a number of provably identical ways. Like you did with the sinc function, you can directly convolve with the impulse response. You can fft the signal and the ir and multiply their bins one by one. The most common technique is a summing delay line, with scaled feedback terms. These can all theoretically give identical output, and there are formulas for converting from one of these to the other. The most computationally efficient if you have a short enough ir is the summing delay line. The analog world uses the same math to make an analog summing delay line (the other styles don't quite work in analog iirc).
posted by idiopath at 10:05 PM on March 8, 2012


Something unscientific?

These kinds of tests are notoriously difficult to do well. So, yep, that's my guess.
posted by unSane at 10:06 PM on March 8, 2012


You also have to bear in mind that there are generally many many other ugly things going on in the audio chain that you never know about and would be horrified about -- EQ which buggers up the phase, limiters, exciters, distortion, dumping stuff out to tape and back, weird phase monkeying to increase the perceived stereo field, saturation plugins or all sorts of analog weirdness from the console. All of this stuff is massively more significant than all of the other stuff we are discussing

YES. I think the layman’s idea of what happens in a studio is part of the problem. There are classical and jazz guys trying to make pristine recordings (probably not all of them) but any kind of pop music is a mash of decidedly non audiophile junk, but junk that sounds good. I personally distort almost everything to some extent.

I always say that if I move any one of the mics a couple inches it will change the sound of the recording way more than the sampling rate or the bit depth. It’s not that it doesn’t matter, it’s just the least important thing I do, and the one people want to get hung up on the most. Gearslutz is full of people who are admittedly beginners and only know the basics of recording who insist on using nothing less than 24/96.

I’m not trying to dismiss the tech talk, it’s completely over my head, but addressing the original question of wether the consumer is being ripped off.
posted by bongo_x at 10:13 PM on March 8, 2012


(To offer one possibility: the usual suspect in these things is a simple volume difference. Louder always sounds better, even when it's as little as 0.2 dB. So when you are doing an ABX you have to be absolutely certain that your volume levels - RMS and not peak - are exactly matched. Unless you explicitly do this, all bets are off. It's quite possible for converters to silently introduce volume changes, and as soon as they do that all bets are off. So you would need to go back and run both the 16-bit and 24-bit file through a plugin and see how the RMS matched up.)
posted by unSane at 10:15 PM on March 8, 2012


another possibility (however faint): the mix was not properly bandpassed during studio production. There are a bunch of counterintuitive ways to introduce subtle but detectable levels of out of band frequencies with operations as simple as punch in or overdub, not to mention plugins. Releasing at a higher sr and or bitrate means more fudge room to compensate the accidental introduction of artifacts. Though if they work at a higher resolution in studio and lowpass when they decimate the sr/br that should be just as good.This is just speculation mind you.
posted by idiopath at 11:13 PM on March 8, 2012


To offer one possibility: the usual suspect in these things is a simple volume difference.
Ok, makes sense. Unfortunately i'm in no position to re-examine or recreate the test. The benefits/lack therof is a frustrating idea simply because it's not something the average listener can just ABX on their laptop with their favorite pair of headphones. I can't speculate on the reasons Neil Young would have to advocate that specific format but I can certainly understand his desire to have the discussion with the goal of pushing the generally accepted quality of music up a few notches.
posted by tinamonster at 11:18 PM on March 8, 2012


If your favorite audiophile can't even pass the Coat Hanger Test then how the hell is MATH going to convince them of anything?
posted by ShutterBun at 12:08 AM on March 9, 2012


have been told that the cumulative cancellations / summations that are applied to the signal over time get you the desired response. Think of it this way: if a single click is the input, and the output looks something like a sinc function, than if the filter is linear, the output will be the overlap of many copies of that function over time, following the curve of the input. If it isn't linear, then it isn't a brick wall lowpass.
Right, but I think that what we've seen is that in order for the sinc function to work you really need a pretty decent number of summands included in the final output in order to work well as you get close to the Niquist limit. At least 50 or 100. So the question is, even if you have something that looks like a sinc for that pulse over a short distance, do you still really get the 'full' sinc for each sample over hundreds of samples?

Of course these days it probably wouldn't be out of the question to include 50 or 100 capacitors in order to make sure you get all the information you need. Looking at wikipedia, this would be an Finite Impulse Response filter, while an RC circuit is an infinite impulse response filter. The key, though is that an RC circuit uses an exponential decay, rather then a linear decay. Of course, you could also use these in combination: Linear for the first few samples, then an RC circuit for all the rest. After a decent number of samples, you can 'match' 1/x to some exponential curve more easily with less error (I guess)

I guess the question is how close does 'real world' hardware match up with an idealized sinc function? As you get closer to the Niquist limit, you're more likely to run into some defect. But, you should have a decent amount of headroom, since the Niquist limit for 44.1khz is about 10% above the limit for human hearing.

At the same time though, since everything is a digital download anyway, it doesn't seem like it would be very difficult to let people download 192/24 files if they wanted. You don't need a new physical standard or player, just new software. The output would be less dependent on getting everything perfect, and you could actually design a simpler DAC.

I think it would be more fun if they added more channels, though, so you could get all the tracks used during recording, and mix your own version for your own setup.
posted by delmoi at 3:24 AM on March 9, 2012


Why do you mention "pulse over a short distance" in reference to impulse response? Why would the impulse response get truncated? The definition of a linear filter is that the output for an input of a+b is always the same as the output for a plus the output for b - so, in other words the impulse response describes it completely and unambiguously.

If all you need are the frequencies below 22050, distributing 192k gains nothing over upsampling from 44.1k to 192k inside the DAC. And many DACs do that (maybe not to 192k, but to some higher rate), it is called "oversampling", and it means your brick wall need not be as accurate (since it gets better in practice as you come down to a smaller % of nyquist).

More channels as point sources would be great, but few producers would go for it. You'd sooner see George Lucus releae all his raw Star Wars takes so every fan could make their own cut of the film.
posted by idiopath at 4:16 AM on March 9, 2012


I'm afraid you really are out of your depth here: you're confusing sampling rate with bit depth. Sampling rate has no impact on dynamic range. Bit depth is directly related to dynamic range.

That is a really naive view of the theory. I mean.. Either we are talking delta-sigma, in which case it is flat out wrong (there is a direct trade off between bit depth and pass band), or we are talking PCM, in which case the anti-aliasing filter specifications are rooted in the combination of sampling rate AND bit depth.

Which goes to a lot of other comments recently in this thread. Basically saying with astonishment "the DAC must matter a lot". All I can say to that is WTF?! Did you seriously think the DAC didn't matter? The format used for storing data on media, as long as it isn't utter garbage (early mp3, especially at low bitrate, was garbage) is of very little importance. What matters is the conversion at either end.
posted by Chuckles at 8:59 AM on March 9, 2012 [1 favorite]


That is a really naive view of the theory. I mean.. Either we are talking delta-sigma, in which case it is flat out wrong (there is a direct trade off between bit depth and pass band), or we are talking PCM, in which case the anti-aliasing filter specifications are rooted in the combination of sampling rate AND bit depth.

The context was obviously PCM and it is true that sampling rate does not affect dynamic range. Nobody was talking about the AA filter. Weird.
posted by unSane at 9:29 AM on March 9, 2012


unSane: no, he's right, boosting SR does increase dynamic range. Bit depth and SR are tied in messy counterintuitive ways in PCM because frequency is just an abstraction for amplitude and time. The DAC is easier/cheaper to do right if you have better bit depth for example. And I don't have a cite for this yet, but in other ways you can get equivalent gains from increasing either one in isolation. The simplistic/intuitive version is that either way, you are increasing bits/second, so all of timing info, amplitude info and frequency info improve if you increase either SR or bit depth. It can be though of as the win/win corrolary to the fundamental timing vs. frequency uncertainty.

Also, I need to qualify all my statements above regarding out of band frequencies via simple mixing: this is only a concern in the presence of non-linear effects (including but not limited to dynamic range compression, sharp or resonant filters, smoothing to prevent hard clipping (like the classic tanh waveshaping)). These are ubiquitous in studio usage, but without them, digital mixing will not risk frequency artifacts.
posted by idiopath at 10:06 AM on March 9, 2012


The context was obviously PCM and it is true that sampling rate does not affect dynamic range. Nobody was talking about the AA filter. Weird.

In the real world, the filter is everything.
posted by Chuckles at 10:45 AM on March 9, 2012


I should preview! Although I'm not actually sure idiopath and I are saying the same thing... Information theory, and I guess math in general, has this way of circling back on itself :P

In general, it really makes me sad to read all this stuff about how stupid audiophiles are. Personally, I've outgrown it, but.. While their knowledge of how to do good sound reproduction is drowning in snake oil, they do get great results.

Which kind of brings me back to Neil Young's misguided rant. Fundamentally, as we've been discussing, he is not correct to concentrate on the storage format. Practically he is not correct for another reason, that is "why does it matter?":
The ears are the window to the soul, and if you hear all of it ... we feel it
Most of the time most people want candy, not gut wrenching. I have a nice old AM/FM transistor radio that really makes certain old pop music sound great in that candy kind of way.
posted by Chuckles at 11:16 AM on March 9, 2012


Ask any 6 year old who has sneaked to the store with a 5 dollar bill - candy can be gut wrenching :)

I'm not sure we are saying the same thing, either, Chuckles, but then again I never went to college or took a math class beyond algebra 1 (which I flunked). My DSP knowledge is all based on books read, informal conversations with actual experts (I owe so much to a few mathematician friends I have made over the years), and my abortive attempts to design and build analog and digital audio effects.

But less about me and more about the topic: the filter in the DAC is the crucible that proves the value of DSP theory, and at that stage both bit depth and sr contribute to getting accurate frequencies and timings at the output (you can always rob the one to improve the other, so an improvement in sr or data size improves both).
posted by idiopath at 11:30 AM on March 9, 2012


Why do you mention "pulse over a short distance" in reference to impulse response? Why would the impulse response get truncated? The definition of a linear filter is that the output for an input of a+b is always the same as the output for a plus the output for b - so, in other words the impulse response describes it completely and unambiguously.
Right, but a simple RC circuit based lowpass filter is not a linear filter, from what I'm reading. It has an exponential falloff. What I'm trying to say is: sinc works if you have an infinite linear response. But it seems like in the real world you have a choice between infinite response, and linear response.
If all you need are the frequencies below 22050, distributing 192k gains nothing over upsampling from 44.1k to 192k inside the DAC. And many DACs do that (maybe not to 192k, but to some higher rate), it is called "oversampling", and it means your brick wall need not be as accurate (since it gets better in practice as you come down to a smaller % of nyquist).
Oh sure, I think this is probably not really relevant to music, where you only need frequencies up to 20k, and anything then higher then that would be inaudible. But I'm still wondering about what happens when you play audio that's very close to the niquist limit through a "real world" dac (i.e. a non-idealized one that uses a simple RC circuit as a lowpass filter). From what I understand you could use a true sinc based linear finite filter for the most recent N samples, and use an RC circuit for the rest, which would I think help you recover frequencies much closer to the niquist limit then a simple RC circuit would.

This would be, if I'm correct, because the curve for an exponential falloff is much closer to the linear curve as you get farther away from the really dramatic changes between 1/x and 1/(x-1) when x is small -- that is to say you can find some k such that the difference between x-1 - (x-1)-1 and k*ex - k*e(x-1) is minimized as x gets larger.

Note that the voltage at time t of an RC circuit is V0e-(t/RC) where V0 is the initial voltage.
unSane: no, he's right, boosting SR does increase dynamic range. Bit depth and SR are tied in messy counterintuitive ways in PCM because frequency is just an abstraction for amplitude and time. The DAC is easier/cheaper to do right if you have better bit depth for example.
At the very least, you can 'dither' with more precision. Think about an 8-bits/pixel image made from 2562 pixels using dithering, vs one that's 28282.

With the 8-megapixel image, if you look at the area covered by one pixel in the smaller one, you can get a much better idea of what the original color was.
posted by delmoi at 7:50 AM on March 10, 2012


Actually, let me show an example of what I'm talking about with images. here is a picture that was shrunk down to 128x128, and re-expanded to 1024x1024, so you can see all the pixels.

here is the same image where it was converted to 32 color (5 bits per pixel), obviously, much of the detail is gone. On the other hand, this image was made by re-expanding it before cutting it down to 32 colors. Since the 'sample rate' was much higher, when we converted to five bits, you can see much better what the original color was in each pixel.

Also, here's what the full sized image looks like at 5 bits/pixel, and the original cropped image, in case you wanted to see it.

(for some reason photoshop likes to clamp down on the highlights when converting to low bit depth, the low bit-depth images are more muted then they could be, probably because the dither pattern would be more obvious with bright colors)
posted by delmoi at 8:27 AM on March 10, 2012


Regarding image as analog for audio, the audio analog for a bitmap display is a vocoder (vocal encoder), an analog audio data compression device which predates usable digital audio, and converts the amplitudes of one set of frequencies at input to the amplitudes of another set at output, basically just a set of bandpass filters followed by envelope followers driving a set of oscillators (this same setup can encode or decode, because analog is weird). They were invented and extensively used to multiplex telephone bandwidth (multiple frequency shifted conversations could travel in parallel on one analog line as long as none of the frequencies overlapped, then the telco inverted the frequency shift at the other end), and somewhat popular when repurposed as an effect in funk music to make keyboards talk.

The analogy is that the data for an image is compressed even in a lossless format because you are not encoding the individual oscillations for each pixel, but instead the amplitudes of three oscillators at fixed frequencies (this works because the eye is not nearly as good at registering variance of frequency data as the ear is, so fewer frequencies are needed for a reasonable facsimile).
posted by idiopath at 9:03 AM on March 10, 2012


(on the other hand, the eye is much better at distinguishing position, so you can see imagine each pixel as an analog to a speaker in a ludicrous surround sound scheme).
posted by idiopath at 9:08 AM on March 10, 2012


And the reason a vocoder compressed analog data so well because you can multiplex not only in frequency band but also in time, because the envelope of a known frequency can be changed at a rate orders of magnitude slower than the amplitudes making up that frequency and gets a usable result. This was basically mp3 compression without using digital tech or psychoacoustics (at a much reduced quality but good enough for telephone).

The reason that old telephone lines could handle broadband when it was first introduced, but newer lines needed to be "downgraded", is the newer lines at that time were vocoder multiplexed, limiting their data bandwidth per connection so more connections could travel on one analog wire - sensible for voice, a pain in the ass for modems.
posted by idiopath at 9:18 AM on March 10, 2012


Right, but a simple RC circuit based lowpass filter is not a linear filter, from what I'm reading. It has an exponential falloff. What I'm trying to say is: sinc works if you have an infinite linear response. But it seems like in the real world you have a choice between infinite response, and linear response.

Well... I guess you have to think about linear with respect to what? Rs and Ls and Cs are all considered "linears" in EE, and the voltage and current characteristics in and through those fundamental linear components are all linear with respect to one another. Not linear with respect to time, though they are Time Invariant. The infinite response you are looking for, it seems to me, is built into time-bandwidth duality. It is an artifact of the dual nature of frequency and time that is part of our natural world, not of any given component. Or if you don't like that, just spend a bunch of time looking into Linear Time-invariant system theory (state space, second order differential equations, etc. etc.).

For a more practical example, consider Fourier optics. The focal plane of a lens presents a Fourier transform of the image, and a mask in the shape of a circular hole cut in an opaque material centred on the focal point functions as a low pass filter. Now obviously the Fourier transform plane (aka focal plane), doesn't have infinite extent in any optical system. It often doesn't extend past the lens in any given direction (in, say a microscope or telescope), but that doesn't make lenses fail. No doubt it does limit the quality of the devices in some ways, but not in ways that effect the usefulness.
posted by Chuckles at 11:00 AM on March 10, 2012


Yeah IIRC the level of "nonlinearity" you get with well designed rc circuits is overshadowed by the actual nonlinear fluid dynamics of the air we compress when we make sound, not to mention the super dirty nonlinearities and turbulences in the complex mechanical systems that we call musical instruments. Get this: it is speculated that the increase in width of the critical band with increased amplitude is a consequence of the nonlinearites in the human ear. Yet somehow we recognize the sound of an actual physical instrument when it reaches our ears, with no electronics present to filter tweak or error-correct (regarding the L of RLC, those things are hairy as fuck and very nonlinear but also usually not neccessary - but then again inductors are the only electronic component in a standard speaker beyond the conductor - the mind boggles).
posted by idiopath at 11:28 AM on March 10, 2012


Idiopath: That doesn't matter though, since if live musical instruments aren't sampled and reconstructed. The question is whether or not the sampling process reduces the information that actually makes it to your brain.

Now, I'm not saying that it would be a problem with 44.1kHz. But lets say you were using a 22kHz sample rate. Would sound with frequencies right up around 11kHz be distorted? If you were using an ideal sinc filter, they wouldn't (depending on the duration of the signal. A very short burst might have issues) But if you were using just an RC Circuit lowpass filter, they might be. It depends on how well the lowpass filter does at 'recreating' the sinc function using non-linear hardware.

Also, when I say "linear" here, I mean a very specific thing: the fact that the sinc function has a 1/x term. We need the sinc function to restore the signal to it's original spectrum. There are lots of other things that can be linear/non-linear in different ways.

Like I said though: it's almost certainly irrelevant for 44.1kHz because you have the extra 2kHz of 'space' in the spectrum that's inaudible anyway. So if there are glitches as you get to the niquist limit, those wouldn't be audible. The other thing is, we don't need to assume DACs only have simple RC circuits, they could do much more complex filtering in order to do a better restoration.

On the other hand, it's interesting to think about.
posted by delmoi at 12:01 PM on March 10, 2012


"(depending on the duration of the signal. A very short burst might have issues)"

as I said before: only because short duration is by mathematical definition widened bandwidth, by shortening your excerpt you are literally inserting frequencies above nyquist. This is what I was getting at regarding the dirac delta: a signal of zeros that only has a nonzero value for a duration of one sample, and it has the exact same bandwidth as white noise (all representable frequencies at equal level).

"'Also, when I say "linear" here, I mean a very specific thing: the fact that the sinc function has a 1/x term. We need the sinc function to restore the signal to it's original spectrum"

and what I am saying is that the nonlinearity of a good rc circuit is of the same kind observed in physical musical instruments, air, and the human ear. In other words a piss in the ocean compared to the platonic ideal of just having Neil Young come over and play Rust Never Sleeps for you on his guitar. Remember, he uses inductance pickups and a tube amp, which insert exactly the kind of nonlinearity the rc circuit does, but at a level that is multiple orders of magnitude higher. And so does the body of his guitar. Your walls and floor and hat can be as important here as the filter is when it comes to altering the sound. And the alteration of an rc circuit is the kind that our ears have evolved to compensate for because they are the kind that happen everywhere as the nature of sound, to the degree that calling them errors is arguably incorrect.

Neil Young would be improving the listening experience much more by lowering his volume (the nonlinearity of the air is more intense at higher compression levels, not to mention the distortion from earplugs).
posted by idiopath at 12:27 PM on March 10, 2012


And this is using your own definition of nonlinear (a cromulent one, but not idiomatic in this context), audio engineers would talk about harmonic vs. nonharmonic distortion, harmonic distortion being the kind you get in all the examples I list, which only introduces even multiples of the input frequencies, the kind of distortion that plays well with music and is an essential part of music production (it is harminic distortion which makes a string produce a complex shape with harmonics rather than a sine wave), as opposed to nonharmonic which introduces unrelated frequencies, the kind that ruin music. So any physical object making sound tends to make harmonic distortion as part of the process and that means harmonic distortion afterward is mostly reinforcing and complimenting the existing frequency content. Nonharmonic distortion is easy to get in digital, so it is a much bigger deal. With analog it is almost all harmonic distortion and you suppress it as much as you can but it is there because that is how nature is you know so you just measure the percentage thd and keep it as low as you know how.
posted by idiopath at 12:57 PM on March 10, 2012


The question is whether or not the sampling process reduces the information that actually makes it to your brain.

There are too many things being conflated here... I mean, it should be very obvious to anyone that the recording and playback process as a whole does change the information that actually makes it to your brain.

Our real question: Does the A/D to storage to DAC process change the information content of a signal.

Of course the answer is yes, it does, a lot. But, re Nyquist-Shannon, with appropriate constraints certain aspects can be maintained. Using our understanding of how the ear works, we then choose constraints and proceed to build systems.

We need the sinc function to restore the signal to it's original spectrum.

You don't need a sinc function, you need the time domain version of it, called a zero order hold (or rect function if you like).

JackFlash did already cover this:
Well, if we get into the real details, the DAC doesn't actually sum up sinc pulses as in the mathematical reconstruction because these pulses are not easily generated in the real world. The DAC does actually output the classic stairsteps you see in typical DAC diagrams. This is known as zero order hold -- you output a sample value and hold it until the next sample.

Mathematically, the zero order hold is the convolution of a dirac impulse with a rectangular pulse. In the frequency domain the transform of a rectangular pulse is the sinc function so the effect of the zero order hold -- the stairsteps -- is to multiply the spectrum of the input waveform by the sinc function.
posted by Chuckles at 1:55 PM on March 10, 2012


Chuckles: that is one kind of dac, the kind with the best time domain response and worst frequency domain response. That the frequency transfer function of a sinc filter is a rectangle is a red herring here. Really, the staircase will ideally in a linear system introduce distortion only above nyquist. In the analog world, where *everything* distorts, you want some lowpass present to get the frequencies right. And the flatter the pass band and sharper the cutoff of the lowpass, the closer it comes to being a sinc filter. But it does mess up phase relations between frequencies to some degree. So as always with audio, you choose between frequency accuracy and time accuracy and probably go with something in between. And keep in mind that even the amp and speaker will act as low pass filters.
posted by idiopath at 2:50 PM on March 10, 2012


Here’s a page from AAS, although the conversation may have moved beyond this point.
posted by bongo_x at 6:36 PM on March 10, 2012 [1 favorite]


This was basically mp3 compression without using digital tech or psychoacoustics (at a much reduced quality but good enough for telephone).

From what I understand, g729a still used vocoder compression, and a lot of the audiobooks on itunes use that codec, too, which is why they sound weird sometimes..
posted by empath at 7:43 PM on March 10, 2012


Neil Young would be improving the listening experience much more by lowering his volume

he gets his main tone through a 15 watt fender deluxe - which then gets sent into the bigger amps to become real loud - so i suspect he understands your viewpoint and follows it
posted by pyramid termite at 7:47 PM on March 10, 2012 [1 favorite]


There were some interesting comments on xiphmont's livejournal post.
posted by bukvich at 8:26 PM on March 12, 2012 [1 favorite]


« Older if I can do it, so can you!   |   Error -41: sit by a lake Newer »


This thread has been archived and is closed to new comments