“code golfing.”
August 5, 2018 6:17 PM   Subscribe

This Coder Fit a Bootable CD and Video Game Into a Tweet [Motherboard] “A few weeks ago, Alok Menghrajani, a security engineer at Square, set out to challenge himself. He wanted to fit a bootable CD-ROM, and a retro video game inside it, into a tweet.”
perl -E 'say"A"x46422,"BDRDAwMQFFTCBUT1JJVE8gU1BFQ0lGSUNBVElPTg","A"x54,"Ew","A"x2634,"/0NEMDAxAQ","A"x2721,"BAAAAYQ","A"x30,"SVVVqogAAAAAAAEAF","A"x2676,"LMBaACgB76gfbgTAM0Qv8D4uYAI86qqgcc+AXP45GA8SHIRPFB3DTeYSEhyBSwCa8CwicMB3rSGtkDNFSYwJHvc68MA","VapVqlWq"x330'|base64 -D>cd.iso ~ 2:45 AM - 15 Jun 2018‏ @alokmenghrajani

“The results are pretty cool. Within 280 characters, Menghrajani crafted code that creates a CD-ROM disk image, which can either be booted up in a virtual machine or burned to a physical disc. Inside that is the video game, which he described as a mixture of Tron and Snake. It took Menghrajani two weekends–between 50 and 100 hours–to complete, he estimated.”
posted by Fizz (29 comments total) 31 users marked this as a favorite
 
Funnily enough, this is pretty readable if you know Perl. For example, the ISO image begins with 46422 zero bytes, followed by what looks like some x86 machine code. (All base64 encoded.)
posted by monotreme at 8:37 PM on August 5, 2018 [2 favorites]


Needs a better proofreader; base64 has no -D option.
posted by flabdablet at 10:01 PM on August 5, 2018


Having changed -D to -d, the resulting cd.iso does indeed boot and play a little game. Neat!
posted by flabdablet at 10:05 PM on August 5, 2018 [3 favorites]


As a non-coder (well, I worked as a JavaScript coder for 4 years, but I wouldn't have hired me for that), I almost cannot understand how someone could come up with this. Do you just need to be really familiar with Perl and machine code?

I get the same sense of awe that I get when I've seen code for Atari and Nintendo games that were written in, I guess, assembly or something. I can't fathom just knowing that a couple hex numbers here and there would tell the machine to do this or that.
posted by shapes that haunt the dusk at 11:34 PM on August 5, 2018


-d or -D is operating system dependent. Linux is -d and OSX is -D.
posted by Revvy at 11:36 PM on August 5, 2018 [8 favorites]


Now watch this drive:
perl '-pes/:([0-9]+):/A x $1/eg'<<.|base64 -d>cd.iso
:46422:BDRDAwMQFFTCBUT1JJVE8gU1BFQ0lGSUNBVElPTg:54:Ew:2634:/0NEMDAxA
Q:2721:B:4:YQ:30:SVVVqog:7:EAF:2676:LMBaACgB76gfbgTAM0Qv8D4uYAI86qqg
cc+AXP45GA8SHIRPFB3DTeYSEhyBSwCa8CwicMB3rSGtkDNFSYwJHvc68M:2638:FWq
.
260 bytes (including line breaks) compared to the original's 280 (not including line breaks, which break it if left in).

-d or -D is operating system dependent. Linux is -d and OSX is -D.

Ah! Thank you.

Could somebody with a Mac make sure my version still works there as well after subbing a -D into the first line?
posted by flabdablet at 12:08 AM on August 6, 2018 [3 favorites]


To me, Perl often looks like an incantation to awaken Cthulhu. This particular example does nothing to assuage that feeling.

Yes, I know it's mostly just blocks of repeated base64-encoded compiled assembly code and pretty damn cool; but still, every time I see it I sense the twitch of a tentacle in deep, dark water.

On preview: flabdablet, you're not making me feel any better, you know.
posted by Absolutely No You-Know-What at 12:17 AM on August 6, 2018 [7 favorites]


I am not a coder, so I don't understand why a coder wouldn't "writ[e] code efficiently enough to do something with the least number of characters." Why isn't that SOP?

Also, bravo for the achievement, but I can't be the only one thinking bad people will use this ability for bad purposes. That's not a reason not to do it; I'm just saying I can foresee people innocently clicking on a tweet like this that will do something terrible to and with their computers and phones.
posted by bryon at 12:36 AM on August 6, 2018 [2 favorites]


perl '-pes/:(\d+):/A x$1/eg'<<.|base64 -d>cd.iso
:46422:BDRDAwMQFFTCBUT1JJVE8gU1BFQ0lGSUNBVElPTg:54:Ew:2634:/0NEMDAxA
Q:2721:B:4:YQ:30:SVVVqog:7:EAF:2676:LMBaACgB76gfbgTAM0Qv8D4uYAI86qqg
cc+AXP45GA8SHIRPFB3DTeYSEhyBSwCa8CwicMB3rSGtkDNFSYwJHvc68M:2638:FWq
.
shaves off another four bytes, making it fit in 256 which is kind of a nice number; and at the cost of some potential robustness,
perl '-pes/\d{3,}/A x$&/eg'<<.|base64 -d>cd.iso
46422BDRDAwMQFFTCBUT1JJVE8gU1BFQ0lGSUNBVElPTg054Ew2634/0NEMDAxAQ
2721B004YQ030SVVVqog007EAF2676LMBaACgB76gfbgTAM0Qv8D4uYAI86qqgcc
+AXP45GA8SHIRPFB3DTeYSEhyBSwCa8CwicMB3rSGtkDNFSYwJHvc68M2638FWq
.
gets it down to 243 bytes.
posted by flabdablet at 12:37 AM on August 6, 2018 [3 favorites]


I don't understand why a coder wouldn't "writ[e] code efficiently enough to do something with the least number of characters." Why isn't that SOP?

Because the code resulting from this approach tends very much toward being write-only, as I think the examples above demonstrate pretty clearly. Most of the cost of software development is in extension and maintenance, and if nobody can read your code it gets hella expensive to extend and maintain.
posted by flabdablet at 12:39 AM on August 6, 2018 [20 favorites]


Much code is Write Once, Read Never. Aka the stuff that has comments like 'This is a dirty hack, need to fix later'. From 2006. real example.

The rest is Write Once, Debug Forever*. The problem is, you often never know which type of code you're actually writing until some time later. For the Debug Forever stuff, easily understandable, well documented code is like gold dust, especially for the poor successor who inherits it.

Storage is cheap (relatively speaking) which is the main hardware cost of 'bulky' code while comparing compiler output between understandable and incomprehensible Cthulhu-summoning code is pretty similar outside some particular edge cases, and even then compilers have gotten so good that human optimisation can actually make things worse.

Writing human-readable code does however impose a time cost on the code writer and a certain amount of discipline, both of which are in short supply when under management pressure, which is why I have often felt the need to use rusty spoons on the author of something I'm trying to fix. (which in some cases, would be several years younger me)

* there are rumours of code that is actually rewritten carefully and expanded to a rational plan rather than just replaced with fresh WORN code in an attempt to refactor, or piled up accumulated hacks and 'fixes'. I surely hope somewhere it exists, because it's got to be better than the spaghetti crap I live with.
posted by Absolutely No You-Know-What at 1:20 AM on August 6, 2018 [11 favorites]


I can foresee people innocently clicking on a tweet like this that will do something terrible to and with their computers and phones.

The advertising industry is way ahead of you there.

Publishing code in tweets results in something you actually have to do something to in order to make it run; in this case, paste it into a terminal, then boot the resulting cd.iso in a VM or burn it to disc, then boot the disc. That's way too much work for a successful infection vector.

If your aim is to do terrible things to people's computers and phones, writing code that does those terrible things when the user just clicks on a thing, or visits a web page, is much more fruitful; and the easiest way to make that happen is to find or buy a browser and/or OS exploit that most of your targets won't yet have bothered to patch, then package whatever code you want to run in the form required by that exploit, then embed the package inside an online advertisement.

Advertising servers have historically done a simply woeful job at keeping malware out of the ads they serve, and show no real signs of improving, and that's why the best single measure you can take to protect a computer that goes online in 2018 is to install uBlock Origin (available for Firefox, Chrome, Opera, Safari and Edge) and stop your browser even requesting those ads.
posted by flabdablet at 1:40 AM on August 6, 2018 [12 favorites]


I don't understand why a coder wouldn't "writ[e] code efficiently enough to do something with the least number of characters." Why isn't that SOP?

The tradeoff between efficiency and maintainability has been covered already, but I wanted to add that the most valuable piece of advice I've ever gotten is along the lines of:

Code so that the poor bastard who's having to read and understand your work six months down the line won't have cause to curse your name...because nine times out of ten you will be that poor bastard.
posted by Mr. Bad Example at 2:40 AM on August 6, 2018 [15 favorites]


Why isn't that SOP?


I kindly point you to case 116 of The Codeless Code:

Several monks of the Laughing Monkey Clan found their brother in a state of great anguish, typing frantically at his workstation.

“What vexes you so?” they asked.

Said the monk: “When new business rules are delivered next year, my code will need to be updated. Today the abbot told me who will be assigned this task, and my heart sank. He is an impatient fool who scorns documentation and breezes by comments, electing instead to guess the purpose of everything by name alone. Thus I must idiot-proof every class and method.”

The monk pointed to his screen. “Here he will be tempted to modify this object’s properties, so I must make it immutable to prevent disaster. Here he will surely mistake the purpose of this parameter, so now I must check for an illegal argument wherever it is used.” The monk collapsed upon his keyboard. “Ten thousand curses upon that imbecile, Taw-Jieh!” he wailed. “That he of all people should be chosen to maintain my code!”

The other monks looked at each other uncomfortably.

“But you are Taw-Jieh,” said one.
posted by DreamerFi at 3:49 AM on August 6, 2018 [41 favorites]


Circa 2003, slashdot thread on cool snippets of Perl. Inscrutable code draws a picture of something cool. Seemingly random lines doing amazing things. I love Perl. I copy, paste, run.

Hmm. That's odd. This one doesn't seem to be doing anything yet.

Oh, it's removing all the files from my home directory.

~fin.

Except now that's the whole internet.
posted by roue at 5:00 AM on August 6, 2018


There's a whole stack exchange dedicated to code golfing.
posted by Jacob G at 5:47 AM on August 6, 2018


I don't understand why a coder wouldn't "writ[e] code efficiently enough to do something with the least number of characters." Why isn't that SOP?

What's going on with this example goes well beyond the reasonable limits of "efficient"; folks above have gotten at why having readable, documented code is important, but even if you were going to quibble about where the ideal stopping point between documentation/readability and code concision is, it's not here. It's not anywhere near here.

What's going on here is flirting with what is fondly/bitterly/competitively known as code obfuscation, the act of creating a working program that isn't even readable without a lot of effort. Sometimes obfuscation is the core point of making an unreadable program (maybe it's for a contest; maybe it's to make keywords in your malicious code harder to search for). Sometimes (as in this case) it's a byproduct of not caring that the program be readable; the author used base64 packing not to hide the code from view but to pack it into as space-efficient a package as possible, and the fact that base64 packing by definition renders readable text into unparsable line noise is a side effect.

This isn't efficiency except in the very narrow sense of space efficiency, which almost never matters in practical software development but makes for a very fun challenge if you're goofing around for goofing's sake.

Visual metaphor: let's say you have a stack of file folders full of papers, and you need to organize them. The first question you're going to ask is, "what am I organizing this for?" And practically speaking, there's a couple kinds of answers.

If the answer is "someone needs to be able to get at one or another file very quickly", you might lay the files out on the floor in an organized way, with nice big labels on the front, maybe put up a sign describing the basic organizational structure as well. Someone walks into the room, glances at the files, grabs the one they want, and they're on their way. Downside: takes up an entire room. Upside: this is a metaphor and you have an unlimited number of rooms and everybody's on powered scooters, so the downside isn't a downside. This is your production code. It's well documented, it takes up space, and space is free.

If the answer is "someone needs to be able to get at one or another file reasonably efficiently", you can save some space and just e.g. alphabetize the files in a metal file cabinet. Someone walks into the room, glances at the labels on the drawers, opens the right one, riffles through the files until they hit the one they're looking for, and they're on their way. Downside: a little bit more effort to get out the door with the file. Upside: takes up less space in your infinity of rooms. Maybe you have a lot of files to store. In any case, nobody's going to think a file cabinet's crazy. It's still functional, you can still go get your file reasonably easily. This is your academic, concise code. It's been reduced to a reasonable minimum translation; not as maintainable, but still readable.

But there's another answer: "I want to fit these files into as small a space as I can". And for that, you have to balk at the wastefulness, the lah-dee-dah largesse of file cabinets. You have to get creative. You have to get stupid in a clever way. So you start asking questions and finding space-saving answers. Like:

Do the files really need descriptive labels? What if instead of calling it "Electrodyne Accounting Records" we call this one "A"; and instead of "Insurance Regulation Compliance Documentation" we call this one "B"; and so on for every file. And we can just keep a list somewhere of what A is and what B is instead.

Do the papers in the file really need all this extra white space? Look at these margins! Let's just cut off the left and right and top and bottom inch and a half from every sheet of paper in here, cut down on the size of everything.

Do we really need capital letters? Do we really need the word "the" or "a" in here anywhere? Do we really need vowels? These are all conveniences, niceties for readability; let's get rid of them all and reduce the size of each page by half.

Do we really need all these extra molecules of air between each sheet of paper? Let's use a pneumatic press and some glue and reduce this loose stack of disemvoweled, lower-cased, article-free, marginless papers into a dense, solid brick. Don't worry, the glue is soluble and non-destructive; someone can use tweezers and solvent and get at any sheet they want, in maybe an hour or two tops.

So, there! Now we've managed to get our files down something that will fit in a shoebox. In an office building where we have infinite rooms worth of storage.

This is doing something with the least number of characters; this is writing code efficiently when "efficiently" is taken in a pathologically narrow scope of "as small as possible and nothing else matters". It's also wildly inefficient in every other respect; it takes longer to craft in the first place (all that extra effort coming up with, and implementing, your file folder mangling techniques) and takes far longer to even extract meaning from at all let alone modify or update. It's fun as a puzzle—which is exactly the spirit of this tweet we're talking about—but it has nothing to do with coding as a useful professional or avocational craft.
posted by cortex at 8:44 AM on August 6, 2018 [16 favorites]


Believe you me, if you get called out at 04:00 to fix a problem with software on a regular basis, you tend to want the code to be easily understandable. I have often cursed the author of "clever" code, only to realise that it was my own fault for trying to be efficient in the first place.

The biggest cost of programs is the maintenance, and that usually starts before the things are even completed.
posted by Burn_IT at 9:17 AM on August 6, 2018 [2 favorites]


> I have often cursed the author of "clever" code, only to realise that it was my own fault for trying to be efficient in the first place.

OMG, that's the literal real-world realization of Case 116 of the Codeless Code, as linked above.
posted by RedOrGreen at 9:26 AM on August 6, 2018


Heck, I've written code that I couldn't understand the next morning.
posted by lucidium at 9:34 AM on August 6, 2018 [2 favorites]


Anything can be a one-liner in perl if you try hard and believe in yourself.
posted by mhum at 11:27 AM on August 6, 2018 [3 favorites]


the author used base64 packing not to hide the code from view but to pack it into as space-efficient a package as possible, and the fact that base64 packing by definition renders readable text into unparsable line noise is a side effect.

Pedantry note: In fact base64 expands the encoded data by a factor of 4:3 (every four characters of base64 text encodes three bytes of data).

Base64 encoding is used here not for space efficiency, but for reliably storing arbitrary 8-bit data (mostly 8086 machine code in this instance) in a form that will survive being posted on a service designed for human-readable text. Without the base64 coding it would look even more like line noise.

The real achievement here, to my way of thinking, is not so much the compressing of a 43008-byte CD image into under 280 characters; that's not too hard, given that most of the image is just long runs of zero bytes. What impresses me is the fitting of a little game that's genuinely fun to play into only 67 bytes of machine code. That takes skill.
posted by flabdablet at 3:07 AM on August 7, 2018 [5 favorites]


I can't find anything in any El Torito spec that says that the first sector of El Torito boot loader code needs a 55 AA signature in the last two bytes like a floppy or HD MBR does. Padding the code block completely with zeroes instead makes a CD image that still works for me, and gets the tweet content down to 240 characters:
perl '-pes/^\d+/A x$&/eg'<<.|base64 -d>cd.iso
46422BDRDAwMQFFTCBUT1JJVE8gU1BFQ0lGSUNBVElPTg
54Ew
2634/0NEMDAxAQ
2721B
4YQ
30SVVVqog
7EAF
2676LMBaACgB76gfbgTAM0Qv8D4uYAI86qqgcc+AXP45GA8
SHIRPFB3DTeYSEhyBSwCa8CwicMB3rSGtkDNFSYwJHvc68M
2641
.
posted by flabdablet at 4:53 AM on August 7, 2018


237. No point doing a global substitution when the repeat count now has to be the first thing on an input line, and if I just relax a little on maximum line length so that every line can start with a repeat count, non-global substitution means I don't need to anchor the digit string match.
perl '-pes/\d+/A x$&/e'<<.|base64 -d>cd.iso
46422BDRDAwMQFFTCBUT1JJVE8gU1BFQ0lGSUNBVElPTg
54Ew
2634/0NEMDAxAQ
2721B
4YQ
30SVVVqog
7EAF
2676LMBaACgB76gfbgTAM0Qv8D4uYAI86qqgcc+AXP45GA8SHIRPFB3DTeYSEhyBSwCa8CwicMB3rSGtkDNFSYwJHvc68M
2641
.
posted by flabdablet at 5:21 AM on August 7, 2018 [2 favorites]


236 if the shell supports here-strings as well as here-documents:
perl '-pes/\d+/A x$&/e'<<<'46422BDRDAwMQFFTCBUT1JJVE8gU1BFQ0lGSUNBVElPTg
54Ew
2634/0NEMDAxAQ
2721B
4YQ
30SVVVqog
7EAF
2676LMBaACgB76gfbgTAM0Qv8D4uYAI86qqgcc+AXP45GA8SHIRPFB3DTeYSEhyBSwCa8CwicMB3rSGtkDNFSYwJHvc68M
2641'|base64 -d>cd.iso
posted by flabdablet at 8:30 AM on August 7, 2018


I am not a coder, so I don't understand why a coder wouldn't "writ[e] code efficiently enough to do something with the least number of characters." Why isn't that SOP?

People have already mentioned the importance of readable code. But I think a more fundamental reason is that making code optimally compact and efficient is extra work that often has no benefit.

There are some programming situations where every single byte is precious and you need to shave off every single processing cycle you can. But there are lots of other programming situations these days where the hardware is so powerful compared to the task at hand that you're never going to notice any difference from trying to make your code more efficient.
posted by straight at 11:55 AM on August 10, 2018


Just tried writing a version using PostScript instead of Perl, because PostScript has inbuilt support for base-85 string literals which are a smidgen less inefficient than base-64 encoding; in the course of which I found out that a terminating line feed had crept onto the end of the file I was using for byte counts and my "236 byte" solution above is actually 235.

Kind of nice in that it does the whole thing directly instead of needing a decode pipeline, but still couldn't get it below 240 bytes because all the string literals need delimiters and PS's operator names are not as golf-friendly as Perl's s/// thing.

Here's the PostScript code in case anybody still cares, blown out to 243 bytes with a few extra line breaks for the sake of not inflicting quite such long lines on readers. Tested on Debian. I'm assuming the GhostScript PostScript interpreter will also be accessible on Macs via the gs command, since it's a fundamental part of the CUPS printing infrastructure that Debian and OSX both use.
gs -q ->cd.iso<<\.
1981<~ZN4_>TEi3(ICKpjbn.Y*ppkn\o;\gAa$'NFpuGgF89,;):j3@-Qq0bg"Y9^
j^p&[5!T(2<[Qr(P-86]lh!;i~>2007<~!!#Sj<N:iTz!<<f~>21(a)3<01>2041
(\377CD001\1)1976<13>41(CD001\1EL TORITO SPECIFICATION)34817()
8{print string print}repeat
.
posted by flabdablet at 7:38 AM on August 11, 2018 [1 favorite]


Thematically related, I am currently playing the heck out of EXAPUNKS, the latest Zachtronics game where you are shaving off every single processing cycle you can for, basically, goofy bragging rights. I'm not enough of a hardcore coder to bother with the kind of actual real-world goofiness that the original tweeter or flabdablet is getting up to here, but I love that there's a genre of games creating a playspace in that same spirit and I am currently parallelizing the hell out of a message-passing process just because I can.
posted by cortex at 7:41 AM on August 11, 2018 [1 favorite]


I cut my code golfing teeth on shoehorning little network boot drivers into the 256-byte EPROMs on Apple II peripheral cards in the early 80s, and still get much enjoyment from it.
posted by flabdablet at 7:47 AM on August 11, 2018


« Older medemer: be added to one another   |   Just sit right back and you’ll hear a tail... Newer »


This thread has been archived and is closed to new comments