Behind the Cat Pictures
May 1, 2019 7:44 PM   Subscribe

This article is about how to decode a JPEG image.

In other words, it’s about what it takes to convert the compressed data stored on your computer to the image that appears on the screen. It’s worth learning about not just because it’s important to understand the technology we all use everyday, but also because, as we unravel the layers of compression, we learn a bit about perception and vision, and about what details our eyes are most sensitive to.

It’s also just a lot of fun to play with images this way.
posted by zamboni (24 comments total) 73 users marked this as a favorite
 
The fact that I excitedly skipped ahead to the good bits (the stuff about DCTs) suggests I’m maybe not the intended audience for this paper
posted by aubilenon at 7:50 PM on May 1, 2019 [4 favorites]


My name is JPEG, and I'm here to do two things: encode integers and trick humans. And I'm all out of integers.
posted by RobotVoodooPower at 7:55 PM on May 1, 2019 [12 favorites]


Magic. Got it.
posted by runcibleshaw at 8:39 PM on May 1, 2019 [4 favorites]


Amazing find zamboni and for showing us Parametic Press!

Very useful; I'm always peering into jpegs, looking for things that might be there below the surface.
posted by unearthed at 8:47 PM on May 1, 2019


I'm just amazed they got through the whole thing without going into a diversion about how gif was copyrighted and we were all going to lose control of our pictures one day soon.

I mean, are you even allowed to talk about the JPEG standard without mentioning that?
posted by Tell Me No Lies at 8:48 PM on May 1, 2019


This is good. I live for this kind of thing. I think it's really hard to write something that describes such a complex process while attempting to keep it mostly accessible for people with different levels of tech knowledge. can we keep posting stuff like this a lot because i love it? thank you. (i will help!)
posted by capnsue at 8:52 PM on May 1, 2019 [3 favorites]


how did these people get their cats wedged into their scanner?
posted by slater at 8:53 PM on May 1, 2019 [5 favorites]


I’m obviously not opposed to 14 copies of the same internet cat picture, but cat pictures in the post title may seem like a bit of the old bait and switch to some of the sterner critics.
posted by ActingTheGoat at 9:18 PM on May 1, 2019


You are invited to use the hexadecimal editors to encode your very own cat pictures if there is not enough variety in the samples provided.
posted by fantabulous timewaster at 10:18 PM on May 1, 2019 [5 favorites]


This is very good, and a good demonstration of this publication's goal of showcasing the expository powers of the combination of audio, visual, and interactive dynamic media (indirect quote because copy-paste not working for me). Seems like a project worth keeping an eye on.
posted by Cozybee at 10:52 PM on May 1, 2019 [3 favorites]


The fact that JPEG is human optimized begs the question about a standard that was cat optimized. Probably an image that stayed still for a while and then gave a sudden twitch.
posted by rongorongo at 2:54 AM on May 2, 2019 [2 favorites]


Here's a tip for getting intuition on the discrete cosine transform: if you go to the first Discrete Cosine Transform sandbox (the one with the cat picture) and change one of the numbers to 1000, the 8x8 block corresponding to that line will turn into a visualization of the basis image corresponding to the number you changed. The reason this works is because the DCT represents the image as an overlay of all of these basis images on top of each other, and the numbers simply tell us how bright to make each image (or its inverse, if the number is negative).

Put another way, imagine that we had a convoluted, 8x8 resolution video projector made from 64 different slide projectors all pointed at a screen. Each slide projector corresponds to a single pixel, and its slide is completely black except for a single square corresponding to the pixel that that projector controls. Thus, by adjusting the brightness of a slide projector, we adjust the brightness of its pixel on the screen, so we can make any image we want. What the DCT shows us is that there is an alternative set of slides we can use that will solve the problem just as well, and these are given by the DCT basis images. Any image that you could produce with the pixel-based slides, you could also produce with the DCT basis image slides, and vice versa. All you have to do is figure out how bright to make each projector.

Now, this is undeniably cool, but it's not actually what makes the DCT special. In fact, if we simply chose the images for our slides completely at random, we would almost certainly still be able to produce all the same images as the pixel-based slides. What's really magical about the DCT is that, for most real-world images, many of our slide projectors will either be off or at very low brightness. This is great news for our convoluted video projector, because it means that we usually only need a small subset of our slide projectors to make an image that looks almost as good as the original.

The reason the DCT is so convenient for representing real-world images is because, in the real world, things that are close to each other tend to look similar. 8x8 blocks are usually small enough that you don't get many sharp oscillations from one color to another, so the whole block usually breaks down into a few contiguous regions. If you go into the sandbox and try changing some of the non-zero numbers to 1000, you'll notice that the basis image is usually a few thick stripes or a simple, easily visible checkerboard pattern. What this means is that this 8x8 block can be closely approximated even if we only use basis images with large, simple regions. And since there are much fewer simple basis images than the 64 pixels required to perfectly represent the block, the DCT lets us effectively compress most real-world 8x8 blocks.
posted by J.K. Seazer at 3:01 AM on May 2, 2019 [6 favorites]


Good post. Thank you.
posted by Wolfdog at 4:57 AM on May 2, 2019


One really cool thing you can do with this technique is progressively stream pictures. Imagine seeing a blurry version of the whole image and slowly seeing it become more and more detailed as the download progresses and more DCT coefficients are available.
I needn't imagine this, young man - I had dialup!
posted by clawsoon at 5:38 AM on May 2, 2019 [11 favorites]


The fact that JPEG is human optimized begs the question about a standard that was cat optimized. Probably an image that stayed still for a while and then gave a sudden twitch.

High-resolution greyscale with a very deep dynamic range and high contrast ratios.
posted by acb at 5:45 AM on May 2, 2019 [1 favorite]


More info on the Fourier transforms inside JPEG here (That's the cosine bit the linked article talks about). It's not an immediately obvious thing that fourier transforms can be used for lossy compression, but it works! Maths!
posted by BigCalm at 5:55 AM on May 2, 2019 [1 favorite]


I mean, are you even allowed to talk about the JPEG standard without mentioning that?

Well, if you are going to bring up the GIF, and worry about copyright, the history wouldn't be complete without PNG.
posted by jkaczor at 6:43 AM on May 2, 2019 [1 favorite]


This was very interesting, thanks. The whole site/publication looks like it will be excellent. All the articles have precisely designed interactive elements.


(And by the way it's actually pronounced Jay-Pejj)
posted by sylvanshine at 7:53 PM on May 2, 2019


This was good! I really like that incorporation of live-editable file contents as a means for self-directed experimentation, even if it managed to give me a little bit of agita about breaking something in production while I fooled around. I've always sort of vaguely understood JPEG compression and now I understand it slightly less vaguely, which is nice.

Also the DCT waveforms struck me as immediately familiar, and it took me a second to realize that it's because it's the same basic technique used for part of the wavelet representations of the old Echo Nest Remix API for characterizing analyzed sound files. I didn't reaaaaally get that back when I was playing with it years ago and I still don't really have a comfortable grasp, but it's helpful to see a little bit of shared mathematical heritage there in two nominally unrelated encoding schemes.

I'm just amazed they got through the whole thing without going into a diversion about how gif was copyrighted and we were all going to lose control of our pictures one day soon.

The main Unisys patents on GIF expired something like fifteen years ago, right around the seventh or eighth Year Of Linux On The Desktop. And like jkaczor notes, we all started stanning PNG in the mean time.
posted by cortex at 11:05 PM on May 2, 2019


it's pronouced Jay-peg
posted by lescour at 1:57 AM on May 3, 2019


Just in time for the new jpeg (not kidding): JPEG XL
posted by sammyo at 4:31 AM on May 3, 2019


Will JPEG XL get any more traction than JPEG 2000 did, I wonder?
posted by clawsoon at 10:02 AM on May 3, 2019 [1 favorite]


I had no idea that the JPEG metadata situation was such a mess. Summary:
In practice, a portable JPEG file is pretty much "whatever the libjpeg software supports".
posted by clawsoon at 10:11 AM on May 3, 2019


In practice, a portable JPEG file is pretty much "whatever the libjpeg software supports".

This situation is a lot more common than you might expect. Specifications are suggestions, code is definitive.
posted by Tell Me No Lies at 6:15 AM on May 5, 2019


« Older Ten Years On: The Conspiracy to Kill IE6   |   How do you maintain a vast nation connected by... Newer »


This thread has been archived and is closed to new comments