Explaining the JPEG Algorithm
September 11, 2007 6:35 PM   Subscribe

Algorithm. JPEG compression explained.
posted by cgc373 (32 comments total) 36 users marked this as a favorite

 
This article doesn't really show you what the DCT coefficients actually look like. Imagine this image laid over the matrix right before the "Quantization" section. That image is what made me "get" how JPEG worked: the DCT "figures out" how each block can be expressed by adding up each of those blocks in varying amounts. If you're familiar with JPEG artifacts, you'll definitely recognize those patterns.
posted by zsazsa at 6:45 PM on September 11, 2007 [7 favorites]


wow
posted by growabrain at 6:47 PM on September 11, 2007


It was interesting and then there was math and I ran away

ps: I'm pretty sure that's not a 256 color image, so shouldn't one pixel be represented by 2 or 4 bytes?
posted by nasreddin at 6:52 PM on September 11, 2007 [2 favorites]


There's still a few bugs to be worked out.
posted by StickyCarpet at 6:56 PM on September 11, 2007


nasreddin, each colour of every pixel has 255 possible variations. that's 16581375 colours... or something.
posted by klanawa at 6:59 PM on September 11, 2007


It was interesting and then there was math and I ran away

That's why image compression articles are supposed to use a picture of Lena to keep everyone interested (previously discussed here).
posted by Gary at 7:18 PM on September 11, 2007


Good post, and zsazsa's postscript is nice too.
posted by tss at 7:18 PM on September 11, 2007


From StickyCarpet's link:
These noises are desirable for us to remove.

Best deletion reason ever.
posted by weapons-grade pandemonium at 7:20 PM on September 11, 2007


Sigh.

jpeg, j2k, gif -- and not an img tag to be seen.
posted by eriko at 7:41 PM on September 11, 2007


Don't worry about the factor of 1/2 in front or the constants Cw (Cw = 1 for all w except C0 = 1/Sqrt(2)).

Oh, I wont.
posted by odinsdream at 7:41 PM on September 11, 2007 [3 favorites]


Sigh. An article about jpeg and why it has artifacts that itself includes a screenshot of a program in jpeg format -- the very thing that jpeg should not be used for.
posted by Rhomboid at 7:48 PM on September 11, 2007 [3 favorites]


16,777,216, actually.
posted by b1tr0t at 8:02 PM on September 11, 2007


Similarly, AES block cipher flash animated.
posted by about_time at 8:04 PM on September 11, 2007


Don't worry about the factor of 1/2 in front or the constants Cw (Cw = 1 for all w except C0 = 1/Sqrt(2)).

Oh, I wont.


As someone who maybe actually slightly knows what's going on in this area, those numbers actually are not all that important. Just scaling. Really, the discrete transforms like this always have a ton of mess in the actual formulas that's just bookkeeping and obscures the actual meaning. Basically, to consider the one dimensional case you multiply your data function d(x) by a cos(w*x) function* that has a certain frequency w. You do this multiplication at each point x and add all the results: so d(0)cos(w*0) + d(1)cos(w*1) and so on. The result is that the more your data resembles the cosine function with that frequency w the larger the frequency domain coefficient D(w) will come out. You do this for several frequencies and get the whole D(w) function. All the other numbers are just there to make it actually work right. Sometimes some of those numbers are seriously called "twiddle factors."

This technique and similar ones are very extremely important to all the electronic stuff you enjoy every day. This is actually a sort of easier one because it doesn't deal with complex numbers and complex exponentials, which are like a super version of the cosine function.

(*a cosine function is basically the shape of ripples in water or holding the end of a rope and shaking it up and down)
posted by TheOnlyCoolTim at 8:42 PM on September 11, 2007 [1 favorite]


Is this something I would have to be a software engineer to be interested in?

Hang on a second. I am a software engineer, and I still don't want to read it. Thanks anyway.
posted by cerebus19 at 8:45 PM on September 11, 2007


TheOnlyCoolTim... Every time I hold the end of a rope and shake it up and down, I get something basically the shape of a sine function...

What am I doing wrong?
posted by jefflowrey at 8:52 PM on September 11, 2007


Jefflowery: Start shaking the rope at eye lovel, not at navel level. Duh.
posted by Dataphage at 8:57 PM on September 11, 2007 [2 favorites]


Digital cinema projection is standardized on JPEG2000. Note that that is NOT "Motion JPEG". Which means that each frame is an individually decompressed image that stands on its own, with no previous frame info needed unlike in MPEG video encoding (i.e. no temporal encoding).

24 frames of 8 megapixel JPEG2000 images rendered every second? Yeah, that's some seriously expensive computing horsepower.
posted by intermod at 9:22 PM on September 11, 2007


Disregard the anti-nerd comments above -- I thought this was a great link. Thanks.
posted by spiderskull at 1:33 AM on September 12, 2007


intermod: Motion JPEG is the same thing (apart from using JPEG instead of JPEG2000), just individual JPEGs for each frame. Don't confuse MJPEG and MPEG, although MPEG can function in a similar mode by just using I-frames (i.e. no inter-frame prediction).
posted by rpn at 1:38 AM on September 12, 2007


TheOnlyCoolTim... Every time I hold the end of a rope and shake it up and down, I get something basically the shape of a sine function... What am I doing wrong?

Aaaaa! You're out of phase! Quick, lock yourself in a closet before you screw up spacetime.
posted by kid ichorous at 5:39 AM on September 12, 2007


ZOMG MATH = SCARY.

Er, wait...
posted by delmoi at 6:21 AM on September 12, 2007


Oh, yeah, I meant to post a via Danny Yee (who got it from Robot Wisdom).
posted by cgc373 at 7:02 AM on September 12, 2007


Is this something I would have to be a software engineer to be interested in?

Probably, or at least a mathematician of some sort.

10 years ago I had to implement jpeg decoding on a low powered device, and I was using the jpeg group's decoding library and it was slow as crap. So some manager type comes by and tells me I should be able to optimize it. Fuck. This was about a day before a major demo and there was no way I was going to understand the math behind it or write it more efficiently than the jpeg group did. (sorry, just venting some 10 year old steam). Maybe I could have used this article then. I've bookmarked it, but I doubt I'll ever read it.
posted by DarkForest at 7:20 AM on September 12, 2007


intermod: Heh, I wish digital cinema was "8 megapixel". Unless you're looking at one of those new LCOS projectors from Sony or somebody else, the de-facto standard DLP setup is actually around 2 megapixel, or even less if you're talking about the "scope" aspect ratio.
posted by Potsy at 8:12 AM on September 12, 2007


Potsy: I read that as referring to digital projectors in movie theaters, not home stuff.
posted by ROU_Xenophobe at 9:28 AM on September 12, 2007


And, actually, digital cinema (movie theaters) is 12 megapixel - it's 4K across.
posted by MythMaker at 11:45 AM on September 12, 2007


Basically, everyone realizes that the digital cinema consortium is just wanking when they specify 4k resolution. In fact, that's the maximum resolution, and if you supply 2k, according to the standard, you're required to supply 2k as well, since very little hardware will be able to use 4k.

The specification of 4k JPEG2000 is "old cinema" trying to assert itself as still much better quality than "new cinema", the people who are saying that hey, HD resolution is really better than film anyway for delivery, and probably for other stuff too, if you go all-digital, since you're eliminating grain. I think fewer and fewer people believe the old cinema folks now, though. My guess is 2k is going to be the standard and pretty much no one will use 4k.
posted by Joakim Ziegler at 12:22 PM on September 12, 2007


Well, considering that cameras like the RED ONE actually SHOOT in 4.5K, I'm not so sure that's true. As hard drives continue downward in price, and processors and busses continue downward in price, the digital pipeline certainly becomes easy enough for 4K. Right now it's at the limits of technology, but with Moore's Law, what' difficult now is trivial in a decade.
posted by MythMaker at 2:21 PM on September 12, 2007


MythMaker: Yeah, the Red One can shoot at 4.5k, but I'm involved in a couple of feature projects that are considering using the Red One, and most are thinking they'll actually use 2k. 2k is used a lot for scanning 35mm film too, and there, it actually has to reproduce the grain from the film. Pristine 2k without grain is more than enough.

As a comparison, there's the Kodak test where they showed that the resolution of typical distribution copies in 35mm isn't more than 750-800 lines. Given that that's 1.85:1, that's 1480 lines horizontal, quite a bit less than 2k's 2048. The main reason people are using 4k in post today is to extremely faithfully reproduce the grain in 35mm negatives. Once we start capturing on something that has pixels instead of grain, we'll need a lot less for the same subjective viewing experience.
posted by Joakim Ziegler at 3:29 PM on September 12, 2007


the Red One can shoot at 4.5k ... most are thinking they'll actually use 2k

Special effects master shots were often done in 65 millimeter to hide the artifacts, then transfered to 35 for final viewing.
posted by StickyCarpet at 4:56 PM on September 12, 2007


Sure, 2K is good enough. But 4K will blow people's minds. The grain in a 35mm film scan shows that film resolution is effectively less than 4K.

But claiming that the future, say a decade down the road, won't use 4K is like someone in 1987 proclaiming "Oh my God! A 20 Megabyte Hard Drive! I will never be able to fill it all!" It seems a bit shortsighted.

As the technology gets cheaper and hard drives get bigger and cheaper, there will eventually be no reason at all not to work in 4K.

In addition, there are reasons to shoot oversampled, in 4K, and then downconvert at the end to 2K. The downconversion will smooth out small imperfections (say, from pulling keys) in the final master.

Frankly, I think we'll see video and cinema go *way* beyond 4K. UHDV is 7,680 × 4,320, and they've already done demonstrations of the technology.

Do the productions you're working with have definite access to a RED? There are only 25 in circulation at the moment, you know... You'll have to let me know how the RED works out.
posted by MythMaker at 10:16 PM on September 12, 2007


« Older Homeland Insecurity....  |  YouTube for an old generation,... Newer »


This thread has been archived and is closed to new comments