"At this point, I started banging my head against my desk..."
June 18, 2015 5:04 PM   Subscribe

"What’s compression in the first place? At its most basic, compression is a way of representing data using less space. An emoji is a good metaphor: it represents an entire word or even several words using a single character. Our minds then 'decompress' the character back into the word it represents.

"When hackers see a magical plot-driving compression algorithm, it’s hard to chalk it up as simply a narrative device. After all, universal lossless compression sounds pretty sweet. So, at a recent hackathon, I decided to get to the bottom of middle-out compression."
I Hacked the Middle-Out Compression from 'Silicon Valley' - Alexander Gould, Major League Hacking (Silicon Valley is on FanFare)
posted by joseph conrad is fully awesome (46 comments total) 11 users marked this as a favorite
 
I'll be the spoiler king, but the article is interesting...and I love the show.

The disheartening conclusion: Richard’s magical compression performed no better than existing compression schemes, and in some cases performed worse.

posted by Benway at 5:55 PM on June 18, 2015


Unfortunately for the writers, there is no universal, lossless compressor. If there were such a thing, two different files could be squeezed down to the same, solitary, compressed file — and there would be no way to get back to either of the two original files from there.
posted by a lungful of dragon at 5:57 PM on June 18, 2015 [9 favorites]


But, a fun show, nonetheless.
posted by a lungful of dragon at 5:58 PM on June 18, 2015


What ever happened to fractal encryption?
posted by sammyo at 6:00 PM on June 18, 2015


JPEG2000 uses it. Your computer may well support this format, but it isn't terribly popular.
posted by LogicalDash at 6:04 PM on June 18, 2015


Seriously, if you're not watching Silicon Valley and have access to it, it's overcome a somewhat slow start (the first couple of episodes are, I think, good but below the standard of the series) to become one of the funniest shows on TV right now. It's easily one of Mike Judge's best works.
posted by Pope Guilty at 6:09 PM on June 18, 2015 [7 favorites]


Unfortunately for the writers, there is no universal, lossless compressor.

But wait until you see how awesome my universal lossy compressor is.
posted by GuyZero at 6:10 PM on June 18, 2015 [11 favorites]


The last episode of S1 has perhaps the greatest dick joke ever.
posted by brundlefly at 6:11 PM on June 18, 2015 [12 favorites]


don't leave us hanging
posted by standardasparagus at 6:16 PM on June 18, 2015 [6 favorites]


LogicalDash: "JPEG2000 uses it. Your computer may well support this format, but it isn't terribly popular."

What's fractal about JPEG2000? Isn't JPEG2000 mostly just wavelet?
posted by Joakim Ziegler at 6:18 PM on June 18, 2015




Scary math aside, the equation means that any compression algorithm tested is ranked against a known one, like ZIP or TAR.

Emphasis mine to express raised eyebrow.
posted by edd at 6:29 PM on June 18, 2015 [21 favorites]


Marcus Hutter, a computer scientist who studies artificial intelligence, offers a cash prize for improvements to compressing a particular 100 MB sample of Wikipedia. So far the best attempt shrinks it to 16 MB, and going by Claude Shannon's estimate that English text contains about 1 bit per character of information, it chould be possible to get as low as 12 MB. The current leading algorithm is based on PAQ, which unlike the fictional algorithm, does compress the file bit by bit from start to finish.
posted by Rangi at 6:57 PM on June 18, 2015 [1 favorite]


TAR isn't compression....
posted by schmod at 7:21 PM on June 18, 2015 [1 favorite]




There was a investment scam in the late 80s where someone had claimed to have written software that would compress files down to 1 byte. ISTR a demo version was distributed that could only compress the few demo files that were distributed with it. Then someone noticed that the decompressor software was slightly larger than the sum of all the demo files …
posted by scruss at 7:52 PM on June 18, 2015 [9 favorites]


Unfortunately for the writers, there is no universal, lossless compressor.

I only acknowledge one KOMPRESSOR.
posted by Pope Guilty at 7:54 PM on June 18, 2015 [10 favorites]




Unfortunately for the writers, there is no universal, lossless compressor. If there were such a thing, two different files could be squeezed down to the same, solitary, compressed file — and there would be no way to get back to either of the two original files from there.

Therefore, somehow, we should be able to use SHA-2 as a universal compression tool down to 256-bits! PATENT PENDING!

I know. I know.
posted by Talez at 8:32 PM on June 18, 2015


Now someone else create a snack dick and we'll be in business.
posted by infinitewindow at 9:01 PM on June 18, 2015


scruss: "There was a investment scam in the late 80s where someone had claimed to have written software that would compress files down to 1 byte."

Not quite the same, but anyone else remember Adam Clark's "Adam's Platform Technology" which had been bouncing around in one form or another since the mid-late 90's? The claim was that it could compress "DVD-quality video" into a 56kbps stream.

After buying a defunct gold mining company (plenty of those in Australia at the time) off the shelf, issuing nearly 2 billion shares to himself, renaming it "Media World Communications", selling around 600 million shares to private investors for AU$27 million, floating it for another AU$4.6 million, and funnelling nearly AU$18 million to himself, it ended when MWC announced it didn't actually work and promptly folded...
posted by Pinback at 9:13 PM on June 18, 2015 [1 favorite]


What's the advantage of buying a defunct gold mining company? Why is that better than simply incorporating?
posted by five fresh fish at 10:24 PM on June 18, 2015


What's the advantage of buying a defunct gold mining company? Why is that better than simply incorporating?

Because way back when this was actually going on it took forever to incorporate in Australia. Only recently has it been streamlined.
posted by Talez at 10:41 PM on June 18, 2015


The last episode of season 2 put my heart in my throat and also utter extreme joy at Guilfoyle's face poking out from the hole in the wall.

Too precious.

Brilliant show that yes, overcame a really slow start.

S1 finale had me laughing than I think any other TV show has. If you can't watch the show, just watch the best dick joke of all time
posted by OnTheLastCastle at 11:58 PM on June 18, 2015 [2 favorites]


My favorite data compression scam was a DOS program that purportedly compressed files down to just a few bytes, then deleted the original. To decompress, it would run undelete.
posted by ryanrs at 12:34 AM on June 19, 2015 [14 favorites]


Ah, those heady days when you had to write an article every six months saying $compressionscam couldn't work because x,y and z. Even when the basic idea worked, there was so often something else shady/overblown/unsaid attached (remember Microsoft and Stac, and the DriveSpace court case...). There were so many.

I remember one in particular that promised "compress anything by X%, guaranteed!" that came from a small company in a UK university set up by a professor. I gave him a call (remember when you called people on the phone, rather than 'reaching out'?) and said "This can't work, because, y'know, Shannon and plain logic". "Oh, it does! It does!" "But how?" He explained... pause, puzzled frown at this end and "Isn't that just run-length encoding?" Pause. "Would you like a job?"

We didn't write about it. Remember the days when "news" came in a press release through the mail, and only became proper news after a month when it got printed in a magazine? Or didn't, because it wasn't actually news and if everyone realised it was not a story it just never got going?

And then there was the original Silk Road, which wasn't strictly speaking compression but pretended to be a way to use fractional polarization rotation to stuff a zillion streams down a fibre... I think that one taught me that yes, a scam really can attract an awful lot of investment from people who should know better. It wasn't the last compression scam to manage that trick, either.

Iterated Systems' fractal compression (the stuff that became JPEG2000) was different, because it worked and for a while was really quite exciting. Michael Barnsley was (I imagine still is) a really nice chap, and clearly enthusiastic both about the maths and the business potential. Somewhere, I've still got his book and a fern pin he gave me back when fractals were bona fide magic. The only product I remember it being a big part of was Encarta, which I think used fractal compression in the beginning. When a version came out that didn't, we knew that battle was over.

I haven't been watching SV, becuse I saw the first one and the eye-roll-to-guffaw ratio wasn't promising. I'll have to catch up. Thanks, MeFi!
posted by Devonian at 2:03 AM on June 19, 2015 [7 favorites]


For many reasons SV is a good show, yes, but the reason why it's a great show, IMO, is because of Kumail Nanjiani and Martin Starr. Without those two guys, the show could easily get annoying, laborious, scolding, etc. They are the rare comic relief that are not just comic relief but the sort of central nexus of humor in the show -- most of the other really good jokes spin off of them and their reactions to things.

Also, SPOILER...

Was anyone else satisfied with the "decision" made at the end of Season 2? It was meant to be this big heartbreaking moment but I was like, "Jesus, yes, do that, that should have been done long ago. Okay, good job, everyone."
posted by (Arsenio) Hall and (Warren) Oates at 4:27 AM on June 19, 2015 [4 favorites]


An emoji is a good metaphor: it represents an entire word or even several words using a single character. Our minds then 'decompress' the character back into the word it represents.

Hate to pedantic but this is, like, annoyingly not really right. You could make a rebus out of emojis, but we mostly don't do that. Sometimes there's a determinate translation of an emoji into words and sometimes there's not. But if they do represent things that there's also a representation in words for, they don't represent those words themselves. It's like saying "gato" represents the string "cat" rather than the animal, cat.
posted by batfish at 6:36 AM on June 19, 2015 [2 favorites]


There was a investment scam in the late 80s where someone had claimed to have written software that would compress files down to 1 byte.

Meh. I wrote UltiCompressor back in 1983 on the back of a bar napkin. It compressed the total information in the universe down to a single bit. (It was 0, FYI.)

I'm still working on the decompressor, but as soon as I've got the bugs worked out I'll let you all know.
posted by disconnect at 7:51 AM on June 19, 2015 [1 favorite]


scruss: There was a investment scam in the late 80s where someone had claimed to have written software that would compress files down to 1 byte

I'm not sure if it's the same thing, but I remember a scam in either Fall 1991 or Spring 1992. I don't think it made it to investors, but it got a lot of press and a lot of USENET action. It wasn't that it would compress any file down to 1 byte, but that it could be used iteratively on files that had already been compressed.

That was fascinating. I was in college at the time, working on a computing help desk where a lot of the students were either EE or Math majors (to study comp sci at that school* you did one or the other). There was this lovely sort of humming tension that would start to be almost palpable around the subject, as all these geeks (who understood compression math very well, it was a subject they paid a lot of attention to at that school) would call bullshit on the description of capabilities, and then immediately proceed to this very emotionally-charged game of "but if it's not, how could it be done?"

I think I learned more by watching that about how conspiracy theories spread than anything else. These were guys who absolutely knew better -- but the very idea was so cool, and the buttons were being pushed in just the right way, that they got suckered-in to playing along, and the fact that they did that made it seem credible to so many other people.


--
*which happened coincidentally to be Schneier's alma mater, though he wasn't famous yet...
posted by lodurr at 7:53 AM on June 19, 2015 [1 favorite]


Hate to pedantic but this is, like, annoyingly not really right.

I was actually hoping someone would address this analogy. Not disagreeing with you at all, but this is only scratching the surface of the problems and potential of that analogy.

Language is lossy compression, and emojii are simply a still-more-lossy form of compression by language. Art is also a form of lossy compression, and one of the wonderful things about it is that you can more or less totally lose the keys, and still get something out of it.
posted by lodurr at 7:58 AM on June 19, 2015 [2 favorites]


I just watched that video clip with the dick joke. When Gilfoyle described aligning the dicks tip to tip so Erlich can do four at a time, Jared says, "yes, from the middle out..."

Hence the compression scheme??
posted by njohnson23 at 8:44 AM on June 19, 2015


Yeah, that's the ultimate joke of the season- that what saves their asses is inspiration gleaned from a dick joke.
posted by Pope Guilty at 8:58 AM on June 19, 2015


What if you create an algorithm given nihilist constraints? If you assume everything is pointless, you get a lot more freedom.

/*Function to compress or decompress source data. Context aware, in that it's aware we live in cold empty universe filled with this one dirt clod we're overly sentimental about*/
byte[] transpressor (byte source)
{
return null;
}
posted by mccarty.tim at 9:14 AM on June 19, 2015 [2 favorites]


if everything is pointless then you're going to have real trouble trying to pass anything by reference.
posted by xbonesgt at 9:16 AM on June 19, 2015 [7 favorites]


You can also rename it MehReduce()
posted by mccarty.tim at 9:49 AM on June 19, 2015


That's so pessimistic, though. How about a Panglossian compressor that assumes, since we're in the best of all possible worlds, that the data is already as compressed as it needs to be?
byte[] optimisticCompressor(byte[] source) {
    return source;
}
posted by Rangi at 9:50 AM on June 19, 2015 [5 favorites]


>... but the very idea was so cool, and the buttons were being pushed in just the right way, that they got suckered-in to playing along

So, a bit like Madison Priest, then - minus the space aliens and 'special' cables?
posted by scruss at 11:42 AM on June 19, 2015


I remember one in particular that promised "compress anything by X%, guaranteed!" that came from a small company in a UK university set up by a professor. I gave him a call (remember when you called people on the phone, rather than 'reaching out'?) and said "This can't work, because, y'know, Shannon and plain logic". "Oh, it does! It does!"
People with no practical idea how encryption works, who just tend to think of it as magic, can maybe be forgiven for such a mistake but honestly a professor in any related field should be able to understand that if you had a lossless compression algorithm that literally worked on any input, guaranteed, and yielded a resultant file less than 1.0x the size of the original, you would be able, through repeated application of the algorithm, to reduce any file to (virtually) nothing and still restore it.

Your guy didn't even think through the most basic implications of his claim.
posted by Nerd of the North at 1:06 PM on June 19, 2015 [1 favorite]


Your guy didn't even think through the most basic implications of his claim.

He might have. I do think we might be thinking of the same story, and in the version I encountered, the ability to make already-compressed files compress even more by running the same algorithm on them was a key part of it.

But that's a logical consequence, not a mathematical one. What I observed was that people who understood that the math was impossible still felt compelled to figure out how it might be possible.

It's hard for me to explain why but I find that utterly fascinating.
posted by lodurr at 1:32 PM on June 19, 2015 [1 favorite]


It makes sense to me. It's the same appeal as discredited scientific notions like vitalism. You know it's false, but it's fun to picture a world where it was true. Except, instead of picturing salamanders crawling out of irradiated fireplace ashes, you imagine the fundamentals of information working differently. That might not sound as amazing, but if you spend all day thinking about how to change, transmit, and interpret information at the most basic level, it's a fun thought to play with.
posted by mccarty.tim at 2:03 PM on June 19, 2015 [1 favorite]


Iterated Systems' fractal compression (the stuff that became JPEG2000) was different, because it worked and for a while was really quite exciting.

Lossy compression has so many more knobs to fiddle with, and so much more room to argue "which is better", especially when talking about audio and video. That's where all the fun stuff has been happening for the last several decades.

I think the most recent practical, general-purpose lossless algorithm to make it into a widely-used program was the Burrows–Wheeler transform, which is used in bzip2.

There are also a number of algorithms like LZO that focus on being very, very fast. Fast enough that they can improve both space and time performance for disk storage (not sure if they're fast enough for SSDs though).
posted by ryanrs at 2:58 PM on June 19, 2015


Season 2 hasn't finished here in the UK yet (last week was the courtroom episode), so please don't spoil anything for me.

After every episode ending in another disaster for Pied Piper I would like just one good thing to happen for them before the season's out. Please make your answers appropriately vague, but:

Does it?

[I have a feeling the answer is: "Of course it does, you idiot, otherwise how could there be a season 3?"]
posted by Paul Slade at 3:13 PM on June 19, 2015


That might not sound as amazing, but if you spend all day thinking about how to change, transmit, and interpret information at the most basic level, it's a fun thought to play with.

But, like, you don't even have to bring the notion of information into it. Universal lossless compression is impossible because of the pigeonhole principle. The objection is so basic that I can't really even coherently imagine this contrary universe.
posted by invitapriore at 6:41 PM on June 19, 2015


Sure, you can't compress every n-bit file into less than n bits, because of the pigeonhole principle. What you have to do is make the bits smaller.
posted by Rangi at 6:50 PM on June 19, 2015 [3 favorites]


The pigeonhole principle states that you can't remove the holes with pigeons in them, only the empty holes. In information theory, this means that you can compress the zero bits, but not the ones. That's why general purpose lossless compressors get about 2:1 compression, on average (ascii text does somewhat better because the high bit is always zero).
posted by ryanrs at 2:42 AM on June 20, 2015 [3 favorites]


« Older Is the US justice system up for sale?   |   Everyone Thinks I’m On The Mend Newer »


This thread has been archived and is closed to new comments