April 17, 2012 6:28 AM   Subscribe

Jordan Mechner has posted the source code for the original Apple ][ version of Prince of Persia to GitHub. (Previously.)
posted by Cash4Lead (56 comments total) 30 users marked this as a favorite
I just downloaded an Apple ][ emulator. I wonder if I can compile this on it.
posted by clarknova at 6:43 AM on April 17, 2012 [1 favorite]

jsr fadeout ;...and fade to black
jmp * ;and hang (because it's too much
;trouble to restart)

posted by Sutekh at 6:46 AM on April 17, 2012 [2 favorites]

Jordan Mechner: "The initial commit of the source code is missing some files, which will be converted and added in the future. Therefore, it is not likely the code can be compiled into a working model... yet."
posted by James Scott-Brown at 6:59 AM on April 17, 2012

The source code for Karateka:

posted by kmz at 7:03 AM on April 17, 2012 [6 favorites]

Interesting stuff. I look forward to a thorough dissection. Graphics on the Apple ][ could get tricky, and this was best-in-class.

Also interesting to see the copy protection stuff in there. Never seen the source for one of these up close.
posted by RobotVoodooPower at 7:09 AM on April 17, 2012 [1 favorite]

I got this mixed up with the above FPP and thought Mechner had tweeted the source code. I mean, you had to be efficient back then but damn.

Anyone have an idea where the code for the shadow-fight is? I want to see if there's any fun comments on there.
posted by griphus at 7:20 AM on April 17, 2012

This is fascinating. I can't wait to see how it ends!
posted by mazola at 7:23 AM on April 17, 2012

Some of the code I'm most proud of is older stuff; tight little routines that do one thing really, really well. There was an aspect of craft in dealing with limited resources that just doesn't seem to be apparent in today's web/cloud coding. (Sure there's still areas where craft is essential; game development and drivers for example, but general purpose apps these days are no more sensitive to memory and cpu limitations than a pedestrian would be to the pavement on an Interstate highway.)
posted by seanmpuckett at 7:30 AM on April 17, 2012 [5 favorites]

Mobile apps are fairly sensitive to memory.
posted by jeffamaphone at 7:51 AM on April 17, 2012

jeffamaphone: "Mobile apps are fairly sensitive to memory."

But they give you a little more than 48K RAM to play with.
posted by octothorpe at 8:33 AM on April 17, 2012

But they give you a little more than 48K RAM to play with.

People expect to do more than play Pong on their iPhones.
posted by Blazecock Pileon at 8:48 AM on April 17, 2012

You could play a lot more than Pong on a 48K Apple II.
posted by Mitheral at 8:59 AM on April 17, 2012 [5 favorites]

There was an aspect of craft in dealing with limited resources that just doesn't seem to be apparent in today's web/cloud coding.

This is why I changed tracks and went off to do embedded firmware development for a while. "Here's an 8 MHz processor with 2K of RAM and 16K of Flash, and a totally weird memory model - make it do this cool thing". All the old-school tricks are still relevant, but you get to do your coding work on a nice fast modern workstation.
posted by Mars Saxman at 9:44 AM on April 17, 2012 [2 favorites]

Can I admit that I got a little buzz when I clicked into the Source/ directory and saw they were all .s files?
posted by benito.strauss at 9:50 AM on April 17, 2012

kmz: "The source code for Karateka:


THIS GAME HAS AN ENDING?! 8 year old me played this game 50 times on teh x286 through to the point before the endboss when the stupid portcullis crashed down killing me ... EVERY TIME.
posted by stratastar at 10:19 AM on April 17, 2012 [1 favorite]

I'm sad to see hints that my favorite piece of Apple II code of all time - Roland Gustafsson's RW18 fast disk loader - is used by this game but that its source code is missing.

Having spent many happy hours annotating a fanfold printout of an Apple II Monitor disassembly of that code until I understood it, I would love to be able to point to something online with my knobbly shaking arthritic old finger and say to young people "That right there! That's what inspired 6502 code looks like!"

Hopefully the RW18 source will make it into the second commit.
posted by flabdablet at 10:32 AM on April 17, 2012 [4 favorites]

But they give you a little more than 48K RAM to play with.

The display resolution is also really important here. The Apple II high resolution mode was a stunning 280×192x8 colors.
posted by smackfu at 10:42 AM on April 17, 2012

Not really. The Apple II+ was monochrome at that resolution. Colour came via interaction of set/unset bitmap pixels with the NTSC colour carrier. Pixels set at 101010 were green, at 010101 were pink. If you did 01100110 You got sort of a reddish stripe, and 10011001 was sort of a bluish stripe.
posted by seanmpuckett at 11:00 AM on April 17, 2012


Another funny thing about Karateka -- the code that displays the cut-scenes (one of the first games to do these, btw) is the same routine that drives the game mode, drawing the player and the single opponent.

While hacking around one day, I decided to see if I could cheat and make the opponent die immediately. I put a breakpoint on the "YARGH!" sound routine, found the opponent-death routine and hooked it up to the Esc key.

The kill key worked, but unexpectedly it also worked in the cut scenes. Hit Esc ... YARGH! Bad guy dies. Game over, or at least forever stalled.
posted by RobotVoodooPower at 11:04 AM on April 17, 2012 [6 favorites]

Looks suspiciously like the source code for the T-800.
posted by veedubya at 11:07 AM on April 17, 2012 [2 favorites]

More PoP source goodies: the guy behind the Prince of Persia Commodore 64 port describes how the port was made on his development blog.
posted by archagon at 11:17 AM on April 17, 2012

(and this was before the original source code was found, by the way)
posted by archagon at 11:18 AM on April 17, 2012

THIS GAME HAS AN ENDING?! 8 year old me played this game 50 times on teh x286 through to the point before the endboss when the stupid portcullis crashed down killing me ... EVERY TIME.

Heh, I only got to that point once and it was the last time I played Karateka. It had taken days just to figure out what keys did what. (No instruction manual and I barely read English anyway.) And then actually fighting my way through what seemed to be interminable waves of enemies. And then I saw the castle door! Hooray! Let me run- *crash*


Never played it again. I can't imagine how mad I would have been if I'd gotten to the ending.

(BTW, I believe what you're supposed to do at the door is get very close to it, wait for the gate to crash down, then run through as it's raising up.)
posted by kmz at 11:47 AM on April 17, 2012

Yeah, I used to write code like this back on the ol' C64. Most of my stuff were in a mixture of Commodore BASIC with machine code in optimized places though.

ryoshu: For use when the rest of the source is committed.
That emulator won't be too useful -- 8-bit source code is heavily hardware dependent. You don't call API functions to draw things on screen, you directly manipulate hardware registers. You'll need an outright emulator to run this, in addition to a compiler.

jcreigh: I don't understand everything that's going on here, but it's hard to avoid a sense of fanfare. Lightning! Boost meter! (Which I believe increases your max health by 1) Cue the rejoin song!

Here is a brief summary of 6502 machine code. LDA and STA: Load the Accumulator from, or Store the Accumulator in, memory. LDX, STA, LDY and STY are similar, but for the X and Y registers. JMP is JuMP to a location, JSR is Jump to a Sub-Routine, and RTS is ReTurn from Sub-routine. Many values in 6502 assembly are given in hexadecimal, which in the assembly style of the machine are prefixed with dollar-signs. The punctuation signifies addressing mode*. A # after an instruction signifies "immediate mode," meaning the instruction acts on a literal value; no hash-mark means operating on the contents of a memory location. There are other addressing modes that can some degree of redirection to the target location, such as adding in the X or Y values, or even using one of those registers as a pointer modifier.

ADC is AdD with Carry, and SBC is SuBtract with Carry. INX, INY, INC/DEX, DEY, DEC: Increment/Decrement the X register, Y register or memory. Nearly every instruction that begins with a B is a Branch if some sort. ROL/ROR are bitshift operations, that move all the bits of a value left or right one space, useful for quick multiplication. (Just like adding or removing a zero at the beginning of a decimal number multiplies or divides it by ten, doing so with a binary number does so by two. This is important because there is no hardware multiplication or division on the 6502. You have to do such operations "manually," with additions, subtractions and bitshifts.)

The machine contains a scant three generally-useable hardware registers: the Accumulator, and two registers named X and Y. Although there are some operations that directly manipulate memory (like INC and DEC, for Increment/Decrement a memory address), most 6502 code concerns itself with the state of these three values. This results in a lot of temporary storage of values, to make room in the hardware registers for something else. On the plus side, the 6502 is rather speedy: most instructions don't take many cycles to complete, so there's not necessarily a huge performance hit to doing this.

Everything a processor does ultimately boils down to math, and most math is done in the Accumulator. X and Y are mostly (but not entirely) similar to each other. There is another, hidden register that just contains status flags, that are automatically set whenever a math operation is performed, flags like Zero, Carry and Overflow. So for example, if the result of an addition is zero, the Zero flag is set, otherwise it is cleared. These flags are what enable branching to work; there is a set of instructions, like BEQ and BNE, that simply transfer program execution if a specific flag (Zero in this case) is set or not. (BEQ stands for Branch if EQual zero, BNE for Branch if Not Equal zero.) Branch instructions are odd in that they are all relative; instead of branching to a precise address, you always transfer ahead or back a signed one-byte number of locations. This keeps both the size and the execution time of branch instructions down, but it limits the range which you can branch to to -127 to 126 locations away. (If you have to branch further, you have to have a JMP instruction at the target to go father.)

A couple of relatively little-used instructions are BCD and CLD, which set and clear the Binary-Coded Decimal mode. When in this mode, additions and subtractions use an alternate set of rules that treat each byte as two ten-digit values, instead of eight two-digit ones. (Fun fact: the Famicom/NES's version of the chip, the Ricoh 2A03, lacks this mode.) This is useful to game developers because this mode makes it a little easier to display decimal values on-screen.

* On addressing modes. The assembler turns each instruction into a matching 8-bit value, called an opcode. The 6502 has an interesting design in that most instruction codes have a base bit pattern that can be set for different addressing modes by adding or subtracting different bits. So, an LDA in immediate or relative mode is the same bit code, but with some other bits set or cleared to set the mode. In the original chips, internally, the processor doesn't check whether the mode given for the instruction is supported, is legal, or even makes sense, meaning there are a lot of opcodes that do nonsense things. Some of these crash the chip, or produce non-useful results, but some of them are valuable. Some of these opcodes can be used to shave cycles in a highly-optimized loop, and some others, as undocumented instructions, are useful in copy protection schemes. Later revisions of the chip, or those made by manufacturers that second-sourced the 6502's design, might have different behavior for these codes. This is one of those little things that keeps emulator authors on their toes.

Blazecock Pileon: People expect to do more than play Pong on their iPhones.

People didn't expect to do more than play Pong, and Tank, on their Atari 2600. The system was designed around those two games, but that didn't stop developers for making hundreds of other games for that system, including the first action-adventure (Adventure) and the first platformer (Pitfall).

seanmpuckett: Not really. The Apple II+ was monochrome at that resolution. Colour came via interaction of set/unset bitmap pixels with the NTSC colour carrier. Pixels set at 101010 were green, at 010101 were pink. If you did 01100110 You got sort of a reddish stripe, and 10011001 was sort of a bluish stripe.

Which is just another example of Steve Wozniak's genius.
posted by JHarris at 12:02 PM on April 17, 2012 [13 favorites]

seanmpuckett: Some of the code I'm most proud of is older stuff

I coded up Duff's Device a few years ago for some reason, and was disappointed to find that a basic for-loop solution was much faster after the optimizing compiler had its way. So sad.
posted by qxntpqbbbqxl at 12:50 PM on April 17, 2012

RobotVoodooPower: "the code that displays the cut-scenes (one of the first games to do these, btw) is the same routine that drives the game mode"

While I don't disagree with that at all, it got me thinking and reminded me that there were cutscenes in the original Pac Man (see around 4:35), way back in 1980.

I wonder if there are still earlier examples.
posted by phl at 1:00 PM on April 17, 2012

Two things that are nuts:

When Prince of Persia was first released, the Apple ][ was already twelve years old.

The 6502 has only 3510 transistors. Here's a circuit simulation. On my 1.4 billion transistor cpu, it runs at 5Hz. In javascript.
posted by aubilenon at 1:28 PM on April 17, 2012 [7 favorites]

Oh, you can turn off animation - for me that speeds it up by a factor of 18.
posted by aubilenon at 1:50 PM on April 17, 2012

phl: Yes, you are probably right about Pac-Man. Wikipedia lists Space Invaders Part II, but Pac-Man had *three* different scenes :)
posted by RobotVoodooPower at 2:35 PM on April 17, 2012

Never seen the source for one of these up close.

Meh, some of it was a simple loop that xor'ed memory before execution then xor'ed it back.

Things like softcore adventure - written in AppleSoft came pre-broken. If you get the item from gal #2 without actually 'doing' #2 you'd see the AppleSoft would say "you won" but the code would crash before getting there.
posted by rough ashlar at 2:56 PM on April 17, 2012

There was a time I cracked games for a legitimate living. Best work I saw was a VM written in a VM with checksums poked via offsets set in all three execution levels into essential game routines. It also had a decoy checksum routine that it would use instead if it detected SoftICE running.

Good times.
posted by seanmpuckett at 6:21 PM on April 17, 2012 [2 favorites]

The VMs in that system were for protection only. The way it worked was the game code was non functional as loaded from storage. The first VM was trivially encrypted on storage, maybe as a bit of "buffer trash", and after the title sequence displayed would be loaded, decrypted and executed which then ran a second VM which would run code from other buffer trash that would reading and checksum the "damaged" disk media with the checksum results poked into the first VM's execution logic so subsequent virtual instructions would work properly, and from there into game code so the player could actually run the game as designed.

The decoy "SoftICE detected" algorithm was a much simpler bit of encrypted buffer trash and would just read the disc and store values into game code that made it appear to work but the game was actually impossible to win. So what happened there is that title got out into the scene and people would play it and then there'd be an uproar because it turned out the crack was bad with an unplayable game and then there'd be a long wait while people tried to figure out what the fuck was going on with the real copy protection.

It's kind of sad but I was doing this ignominiously for a shovelware company that would buy up old licenses for games sold in Europe: I would crack the games (often because there was no source [easily] available) and the results got loaded onto discs and sold in American WalMart for $5 or something with no copy protection at all.
posted by seanmpuckett at 6:57 PM on April 17, 2012 [7 favorites]

seanmpuckett, that is awesome. More stories please!
posted by JHarris at 9:18 PM on April 17, 2012 [1 favorite]

If one were to build the Prince of Persia source code these days, how would one go about it? One could find a 6502 assembler that made Apple II binaries (or disk images or similar) and make something for running on a vintage Apple II (or emulator). Though, given that actual Apple IIs are thin on the ground, I imagine the most sensible path would be to write something that translates it into a higher-level language (i.e., Javascript or Lua or even JVM bytecode) and then wrap the output in a VM that looks like the parts of an Apple II it expects.
posted by acb at 4:15 AM on April 18, 2012

How I'd go about it is fire up my Apple IIe and do it with the original toolchain.
posted by flabdablet at 4:47 AM on April 18, 2012

I don't really have much in the way of stories about that period of time. I was broke and honestly I'd rather have been getting my own games out there. Indie games were a tough, tough sell then because distribution was 100% physical media. I tried to get the shovelware guy to put my shareware stuff on disk just to get some sales, and even designed some boxes for them for him, but he was pretty meh about stuff that didn't have a hook, like "top seller in germany" or something.

Cracking is puzzle solving, though, which was kind of fun way to think of it. Ultimately the code of the era had to run on the processor in order to function so the more successful copy protection schemes (by more successful I mean "took longer to crack"; they all fell) were more obfuscated and tricky, with dead-ends leading to premature broken 0day releases which would (in theory) build buzz for the game in the scene leading to more sales. I don't think that's how it played out in real life though -- pirates gonna pirate.

Approaching the puzzles from the solution and working backwards generally made things pretty easy.

A little background: In that era software came on floppy disks. Data on disk is stored in sectors, which would hold a small amount of data each. A dozen or so sectors (format dependent) made up a track, which is basically a single ring of data on the disk. And each disk had 40 or 80 concentric tracks. Now your computer reads a disk by seeking (physically moving the magnetic head in the drive) to a track, and then looking for a sector by number, then reading that sector.

So how do you know the data's good on a disk? What the "format" process is, with a floppy disk, is basically the disk drive steps through each track on the disk and writes out not only empty sectors but sector markers between each sector: magnetic signals that have the sector number, sector data checksums and some boundary data. The sector markers are useful so the drive can quickly find the sector when it is read, but also so the data in the sector can be checked for validity by reading the checksum for the sector. And that's how your disk drive knows that data read for that sector is good or bad. If you've ever used a floppy disk I'm sure you've head the grind-cachunk-grind-cachunk-grind-cachunk of the drive trying to reread a sector that has a checksum mismatch. Often a bad read is a result of a slight misalignment of the read head with the track on the disk; by reseeking sometimes you can get a clean read. (fun tip: if you have a marginal disk you can sometimes get a clean read by jamming your finger into the drive slot during a re-read attempt and pressing against the disk envelope, slowing the rotation speed a little and mis-aligning the disk a little.)

OKAY so floppy drives in your computer write data in a certain way, according to rules of the disk's format and track alignment and so on. So if you copy a disk with your own computer, the resulting copy is always* formatted a certain way. If the original disk is formatted in a different way (but still readable by the computer), then the copy protection may be able to detect a difference between the original physical media and a home-made copy of that physical media.

So the challenge for copy protection makers was to create an incorrectly formatted sector or track that could be read (or verified as incorrect) by the game when it was being run, but that could not be duplicated at home.

Now the disk duplication houses that turned out copies of games by the hundreds of thousands of copies used machines that were basically magnetic xerox copiers. They actually didn't care what the data on the disk was, they would read the exact pattern of magnetic domains on the disk and recreate it precisely, track by track. So the dupers could copy anything. In fact, the dupers would work with the copy protection designers in coming up with new and interesting "bad formats" that could be read and never* written.

These bad formats would include incorrectly numbered sectors, an incorrect number of sectors per track, missing tracks, EXTRA tracks (up to 82 tracks could be reliably read on most drives), tracks formatted in low density as opposed to high density, long sectors, and fun things like good data with bad checksums.

Here's the thing: a disk-based protection scheme has to, at one time or another, attempt to read the damn disk. And that's your entry point. You know where the bad format is on the disk; what track/sector is hosed. So as a cracker, you start by stepping through the drive access code to give you an idea of where the bad formated data is supposed to go in memory. And you go from there.

So the copy protection developer's job is to try and separate and obfuscate as much as possible the "must happen" reading of the bad formatted data off the physical disk from the "must happen" checking of the badly formatted data to verify that it's really badly formatted. The more convoluted this is, the harder the crackers job is. Personally I despaired at how much cleverness went into copy protection as opposed to the actual games themselves.

Here's some of the obfuscation techniques I encountered, and how to respond to them:

- Disk format was altered but in such a way that reading the bad format didn't cause any audible hiccups, e.g. drive reseeks. So it would take a effort and analysis to even learn where to start checking for the copy protection code. (Analyze the entire disk and look for the bad sectors, then figure out when those sectors are read and where the data goes.)

- Bad formats embedded in presumed executable areas of the code itself rather than sitting pretty off on track 80 or whatever; often disguised as a bunch of game tuning constants. (Not too effective once you knew where the bad sector data winds up.)

- Multiple code paths, where if you cracked one path the game might pretend to run but would hose you anyway sooner or later. (Tedious, annoying, potentially embarassing. Additional tracing would typically turn up these little surprises before too long.)

- Encrypted copy protection code, simple XORs through much more complex stuff that encrypts and decrypts itself only for very short periods of time. (Dudes, the processor has got to execute it at some point, you can't hide from the CPU.)

- Encrypted GAME code with the decryption key derived as a result an analysis of the bad format. (Easy to crack because big blocks of decryption code are kind of obvious when executing.)

- Virtual machines were probably the most interesting challenge, because you'd get wound up in a knot of code that is all what the fuck is this tight loop doing, and what are all these basically pointless jump offsets, and... oh okay, we're doing a VM. (You can nest as much VM as you want but ultimately there is a chunk of data that came off the drive that has to be verified at some point and if you can figure out what that data is doing, rather than figuring out who is doing it, you are home free.)

- Executable results, by which I mean: ordinarily you might think there'd be an IF statement somewhere which says IF this byte is this value, THEN the disk is legit; but instead the bad value you're checking for is itself executable code that is part of an entirely different part of the program. This is like testing without testing, like that one scene in Diamond Age where two people without knowledge of each other do two different things at a certain time; if either of them don't do that thing at the right time then someone dies. In this case, the game locks up. (Where does the data go?)

- Overlaying code, which typical as a consequence of limited memory space results in code being swapped in from disk dynamically during gameplay. What you might see is when the game loads a title screen graphic into one section of memory and then comes along later and loads a transition scene graphic into the same place, but the transition image is a little smaller and there's a chunk of code at the end that is vital to the copy protection system but is only there for the couple of seconds the transition scene is visible when the copy protection is checked, then goes away again. (Careful stepping through the code.)

And then you'd get all kinds of combinations of this shit, which sometimes didn't make any difference because the weakest technique is the only one you have to crack. There was only one game I didn't manage to crack out of the dozens I did, and I don't even remember its name. I do remember being pretty damn impressed with the ingenuity of it, though, and it wasn't really that I couldn't crack it, it was just so convoluted that the shovelware guy said fuck it nevermind because I was billing hourly, and not cheaply.

Making patches to these games so they'd work unprotected were sometimes pretty simple (just patch out a few opcodes here and there) and sometimes a pain in the ass, especially when there was otherwise encrypted game code that had to go somewhere so the damn thing would run. But really once one had figured out the code paths it was basically a matter of "bypass all that shit, then do a couple block copies and bob's your uncle".

The biggest takeaway I have from that era is basically that copy protection, like locks, only keeps honest people honest. And sometimes not even then. Because once digital lock removal is as easy as using a legitimate key -- which it always becomes -- the only side effect is that legitimate users pay a disproportionate price in 1) more irritation due to inability to create legitimate backups, 2) more failure of the locking system leading to customer service problems, and 3) more expense (always rolled back to the purchaser) for the cleverness going into the locking scheme.

So, when I release software personally, it is with only the most cursory lock that makes it only slightly harder and more annoying to copy than just ponying up and BUYING a license -- for normal people with some disposable income. I don't want to spend hundreds of hours locking shit down. I want to make new awesome software. Some people are gonna copy it anyway because they're broke, and some want the challenge, and some are just assholes who like to rip stuff off.

So instead I practice excellent customer service, and really simple purchase mechanisms, and really trivial locks to discourage those who are just bored, and that seems to be what works.

* This was a fascinating arms race, too: rather than crack the software, other clever people devised copying programs that would talk to the home computer disk drive hardware at very low levels, almost down to the magnetic domains, in an attempt to recreate the bad formats as precisely as possible. They could never achieve a perfect match but in general if you had the latest version of a disk cloner, you could be pretty assured of copying a game from six months ago. The endgame of that arms war was the lasermark system which physically destroyed a portion of the disk media which therefore could never be written to successfully: the copy protection would try to write to that bad sector and then re-read it; if the read data matched what was written, you were running off a copied disk -- which was why some purchased games would insist on not having a write protect tab. Anyway, the lasermark system was pretty expensive, requiring precisely damaged media, so generally only frontline titles used it.
posted by seanmpuckett at 6:38 AM on April 18, 2012 [122 favorites]

I think this post needs a shout-out to MeFi's own jscott, who flew to LA to help perform the digital excavation.
posted by Leon at 6:52 AM on April 18, 2012 [4 favorites]

Anyway, the lasermark system was pretty expensive, requiring precisely damaged media, so generally only frontline titles used it.

Supposedly you could create your own laser damaged disks with a pin though I don't recall ever seeing instruction on how to figure out where to damage the disk.
posted by Mitheral at 7:19 AM on April 18, 2012

Seconded. Awesome stuff seanmpuckett. Thanks. Sometimes I think the disk I miss most from my C64 days is Fast Hack'em. It made 11 year old me feel so cool.

Didn't finish Karateka? I made it through both endings plenty of times. Mainly because that was a fantastic game.
posted by yerfatma at 1:47 PM on April 18, 2012

acb: It's not as simple as that. Computers at the time didn't have nearly as much memory as they had 5 1/4-inch floppy disk space. What you actually need to do is take the object code and build it into a disk image. Back in those days a running program typically had access to the entire machine. Since there was no operating system, and no way to implement virtual memory, the program itself would have to take care of loading new data and levels off the disk into memory as they're needed.

So unless the game is small enough to be entirely loaded into memory, which I doubt it is, without hacking the game is inseparable from its media, or at least an emulation of that media. Which you might be able to make, but then you're basically working to recreate by hand what you could download off a warez site.
posted by JHarris at 2:06 PM on April 18, 2012

If seanmpuckett's comment doesn't make the sidebar, then what is the sidebar for?
posted by JHarris at 2:16 PM on April 18, 2012

Well, I found it via the sidebar (or the sidebar Twitter feed anyway), so there's that.
posted by yerfatma at 2:17 PM on April 18, 2012 [1 favorite]

Yeah, I just noticed now that it had been sidebarred, heh.
posted by JHarris at 2:17 PM on April 18, 2012 [1 favorite]

Great stories, Sean.

The Apple ][ disk drive was an interesting case because it was pretty much directly connected to the CPU without an intermediate controller. Thus the programmer could read/write bits and control the stepper motor with precise timing.

Most of the disk access code (beside the PROM bootstrap code) lived on the disk itself, so you weren't stuck with the default routines. Users quickly improved them and eventually increased read speeds by a factor of 4.

But you could also roll your own sector format, and any copy program which expected the "traditional" sector format would be stymied.

Soon people would start writing "nibble copiers" that operated directly on bits, not on sectors. They sometimes worked, but often required arcane parameters. Copy protection engineers responded by inventing new techniques like "spiral tracking" -- writing a program in a continuous spiral rather than individual tracks.

More fun stuff is on
posted by RobotVoodooPower at 2:33 PM on April 18, 2012 [5 favorites]

seanmpuckett: "dead-ends leading to premature broken 0day releases which would (in theory) build buzz for the game in the scene leading to more sales. I don't think that's how it played out in real life though -- pirates gonna pirate."

This all reminds me of an article in gamasutra from 10 years ago, The author claims that:
"it took over two months for the working patch to appear, after numerous false starts on the part of the pirates (the patch for the European version took another month on top of that). The release of patches that didn't work caused a great deal of confusion among casual pirates and plenty of wasted time and disks among the commercial ones.

Two months may not seem like a long time, but between 30 and 50 percent of most games' total sales occur in that time. Approximately 50 percent of the total sales of Spyro 2, up to December 2000, were in the first two months."
Sadly, I know of no way to run an experiment to demonstrate that copy protection increases sales.
posted by pwnguin at 8:31 PM on April 18, 2012

The Apple ][ disk drive was ... pretty much directly connected to the CPU without an intermediate controller.

Not quite right. There was a controller, and as you'd expect from a Woz design, it was made from a small handful of cheap off-the-shelf general-purpose parts and it was beautiful. Those interested can check the schematics on pages 79 and 80 of a2dos.pdf from this Apple II manuals archive.

There was far less on the Disk II's analog board (the one on the disk drive itself) than was typical for contemporary 5.25" floppies. Instead of a conventional head stepping controller IC with Step and Direction inputs, there was a simple ULN2003 Darlington buffer between the SEEKPHn lines in the Disk II cable and the four coils in the head stepper motor, giving the programmer total control over that motor. There was a write-protect detection switch, a data write circuit that did nothing more than convert the digital WRDATA signal to differential drive for the write head, and a data read circuit consisting of an MC3470 head amplifier directly feeding RDDATA.

The magnetic pattern written to the disk surface was exactly what came down the WRDATA wire; the head would write a flux reversal on the disk when that signal switched from high to low or vice versa. What came back on RDDATA was a 1μs pulse for each flux reversal detected by the read head. The head amplifier was tuned to work reliably for flux reversals between 4μs and 12μs apart.

At the heart of the Disk II interface card that plugged into the Apple ][ expansion slot was the Woz Machine: a state machine built around a 256-byte PROM and a 74LS174 6-bit register, controlling a 74LS323 shift/storage register connected to the Apple II data bus. The state machine was clocked by a 2MHz signal that was also divided to generate the 1MHz CPU clock.

Four of the PROM's data outputs were fed back to four of its address inputs via four bits of the LS174, giving the state machine the ability to run sequences of up to 16 steps. Two address inputs came from a CPU-controllable mode selection latch, allowing the machine to run four different programs: Read Disk Data, Sense Write-Protect, Load Write Latch, Write Disk Data.

The remaining two bits of the LS174, along with an inverter and a NAND gate, were wired to generate a low lasting for one state machine clock cycle for every pulse arriving on RDDATA. That signal fed the seventh PROM address bit, allowing it to run a completely separate set of programs on detection of a data bit.

The last PROM address bit came from the LS323's MSB output, allowing the state machine to run separate progams depending on whether the shift register's high bit was set or not.

Most of the time the state machine ran in eight-step loops, corresponding to the floppy disk's 4μs data bit cell timing, and made the LS323 shift left by one bit every 4μs.

Sense Write-Protect would make the LS323 shift right on every cycle, copying the WRPROT input connected to its MSB shift input to register bit 7. Load Write Latch would force the LS323 to sample the data bus and reset the state machine's counter bits to 0.

Write Disk Data would bounce between looping through states 0-7 or 8-15 depending on the LS323's MSB during state 7 or 15, in such a way as to generate a transition on WRDATA (fed from state machine address bit 3) for every 1-bit in the shift register. It would load new data into the shifter from the data bus during states 0 or 8 only. It was up to the disk write code to make sure there was valid data on the bus exactly when the state machine sampled it; in practice, this meant that disk data had to be written on strict 4μs boundaries with respect to the write that had originally flipped the state machine into Load Data Latch mode.

Read Disk Data would count to 6, then shift a 1-bit into the LSB of the shift register if an RDDATA pulse occured in state 7, 8 or 9; if no pulse had arrived by state 9, it would shift in a 0-bit. RDDATA pulse arrival during states 7 or 9 would remove or add one state respectively, effectively implementing a DPLL that kept the state machine locked to the incoming data stream's bit cell timing.

If the shift register's MSB was set, Read Disk Data would continue to count states and monitor RDDATA pulse arrivals but would leave the shift register in Hold state for 14 to 16 counts (7 to 8 μs) instead of shifting it every 7 to 9. And this provided just enough hold time to guarantee that the CPU would be able to grab data as it arrived using code like this:
rdloop	lda diskdata	;4μs
	bpl rdloop	;3μs if branch taken, else 2μs
Given that data bytes could arrive as quickly as one every 28 CPU cycles if the drive was running fast and/or the drive that wrote the disk was running slow, that left only 21 cycles per byte to decode, stash, checksum and count the incoming data. It wasn't a lot, but it was enough. Just.

It was not until I sat down with a hex dump of the Woz Machine PROM and a copy of the circuit and worked out exactly what it did that I really got state machines. I have never since seen such beautiful integration between a moderately complex state machine and a CPU as that achieved by the Disk II controller. And that is why I so much enjoy the Gustafsson RW18 fast loader, which extracts every last bit of capacity and speed from that hardware and even has an elegantly designed API.
posted by flabdablet at 4:11 AM on April 19, 2012 [19 favorites]

And flabdablet and seanmpuckett's posts are prime examples of why I get irritated when unauthorized network intrusion is called "cracking" by nerds intent on "talking back" the term "hacking." Cracking was a blackhat activity, but it was an art unto itself, and had zilch to do with network security. Hacking is doing something amazing with the machine - and just as there are good wizards and bad wizards, there are good hackers and back hackers, and good hacks and evil hacking.

Most infosec pros these days side-step the issue by calling malicious hacking, especially network intrusion, an "attack" and the attackers "blackhats" or "blackhat hackers" or simply "attackers."
posted by Slap*Happy at 10:42 AM on April 19, 2012 [1 favorite]

For those keeping score at home: Nexus 6, David 8
posted by furtive at 7:28 PM on April 19, 2012

A full appreciation of RW18 requires a little more detail on how the Disk II stored data, and understanding that needs a little background in floppy disk history.

The floppy disk bit stream format used by the original Shugart floppy drives - the ones whose mechanicals Woz chose as the basis for the Disk II, but whose electronics he rejected as too complicated and expensive - was a scheme called FM encoding.

Data on FM-encoded disks was recorded at a constant 125 kbits/s, or 8μs per bit. FM stands for Frequency Modulation, and from a frequency point of view a zero bit appeared as half a cycle at 67.5kHz and a one bit as a whole cycle at 125kHz.

From the time point of view, the start of each bit's 8μs cell on the disk surface was marked with a magnetic flux reversal. If there was an additional flux reversal at 4μs in the centre of a cell, that cell encoded a 1-bit; if not, it encoded a zero. The disk controller would lock onto the 125kHz clock signal generated by the reversals marking cell boundaries, then extract data based on the presence of absence of an additional reversal half way between the clock edges.

FM was a well-established and reliable if not particularly space-efficient encoding, and that was the scheme that Woz originally designed the Disk II controller around. But rather than use the hardware that contemporary controllers devoted to separating FM data from its clock, he pushed that job back onto the CPU. All that the Woz machine was built to do was phase-lock to a 250kHz stream of raw read-head pulses - twice the FM bit cell rate, but possibly with pulses missing - and present those to the processor eight at a time.

Using N and S to represent flux directions within a 4μs stretch of disk surface, a byte full of zero bits might be laid down as NNSSNNSSNNSSNNSS, or a byte full of ones as NSNSNSNSNSNSNSNS.

If the flux pattern on the disk was NNSSNNSS, the on-disk encoding for half a byte of zeroes, the Woz machine would present the processor with a whole byte of data: 10101010. Each 1-bit in that value represents a flux reversal (change from N to S or vice versa); each 0-bit represents a missing reversal.

Another way of looking at this is to see that the odd-numbered bits in a byte read from a Woz machine represented FM clock edges while even-numbered bits represented data. When a Woz machine correctly locked onto an on-disk FM data stream, all the bytes the processor ever read from it would be of the form 1p1q1r1s where the ones are FM clock bits and pqrs are data. And because half a byte is a nybble, it became customary to refer to the raw bytes exchanged between the Woz machine and the CPU as "disk nybbles" or just "nybbles".

The Woz machine made only the barest concession to helping with the "framing" problem of identifying where the FM bit cell boundaries (let alone the disk nybble or actual byte boundaries!) occurred within a stream of FM data. Part of the beautiful genius of Woz lay in recognizing that the barest concession was all that was required.

If its shift register was full (as evidenced by a 1 in its MSB) then the Woz machine would hold it until 4μs after the detection of the next RDDATA pulse (providing, as I said last time, just enough time for the processor to grab the data and retry if it was incomplete), then clear it to all-zeroes before shifting in the 1-bit representing that detection event. It would then shift in more bits at 4&us; intervals until that first 1-bit had propagated all the way over to the MSB.

Since every FM clock edge generated a 1, then provided the Woz machine started any given shift cycle properly aligned with the data stream's bit cell boundaries it would remain so.

If it so happened that the first pulse seen by the Woz machine was not a cell-boundary clock pulse but rather a centre-of-cell data pulse representing a data 1, the machine would happily start shifting from there and end up with data like 11q1r1s1 (note that the FM clock bits now occupy the even bit positions instead of the odd ones). It would stay in this "slipped" condition until such time as it happened to have a full shift register and encounter a 0-data bit cell. At that point there would be an extra 4μs of hold time and the zero bit would be lost; the machine would not begin shifting again until 4μs after the arrival of the subsequent (clock) RDDATA pulse.

This procedure was too simple for reliable detection of the standard "code violation" (one missing FM clock edge) scheme that contemporary disk controllers relied on for identifying the very first bit cell of an on-disk stream, and this became the first of many reasons for the Disk II's incompatibility with industry standards.

Instead of standard code violations, Woz used "self-sync nybbles": the leadin pattern for every disk write was always a sequence of all-ones ($FF) disk nybbles written to the Woz Machine at 36μs intervals rather than the 32μs intervals needed for ordinary data nybbles, followed by a reserved pattern containing multiple FM code violations ($D5) and a properly-coded nybble of zero bits ($AA).

Representing the resulting bit pattern as 1111111101111111101111111101111111101111111101111111101111111101111111101101010110101010, close examination will show that if you follow the Woz machine's rule of skipping to a 1-bit and then collecting the next eight, then regardless of which of the first nine you begin on, you will always be reading the data stream with proper bit-cell and disk nybble framing before you hit the $D5 $AA pattern at the end.

The $D5 pattern can't be mistaken for any of the values that can turn up in the shift register while doing this because it contains more than one zero, and since it's chock full of FM code violations (three of its four clock bits are missing) it will also never occur inside any of the properly FM-coded data laid down after it provided the bit cell framing is correct by then, which it must be. So all that the disk read code ever needed to do to find the start of a sector was something like this:
waitd5	lda diskdata
	bpl waitd5
	cmp #$D5
	bne waitd5

checkaa	lda diskdata
	bpl checkaa
	cmp #$AA
	bne waitd5
I'll come back later and cover the evolution of FM into Disk II GCR, and the processing needed to make that work; so by the time the RW18 source code does get posted, I hope you will be in a position to appreciate just how good it is.
posted by flabdablet at 4:56 AM on April 20, 2012 [15 favorites]

The Shugart SA400 mechanism spun at 300rpm, so each revolution took 200ms. At 8μs per FM bit cell, that yielded a raw track capacity of 3125 bytes, room enough for 10 sectors of 256 data bytes each plus associated inter-sector gaps and address marks. It could step to any of 35 tracks, for a total data capacity of 89,600 bytes.

After designing his minimalist FM read/write engine, Woz realized that it allowed him to use a more space-efficient coding than FM with no change at all to the hardware; not even to the programming of the state machine PROM.

The Woz machine had never actually enforced the use of FM coding for written data; it would simply write out, on 4μs boundaries, a single flux reversal for each 1-bit supplied by the processor, or no reversal for a 0-bit. It was up to software to supply data in 1p1q1r1s form to generate FM coding.

Nor did it require disk data to be FM-coded for successful reading. As long as RDDATA pulses arrived on 4±0.5μs boundaries, and no two pulses were more than 8.5μs apart, and there was a pulse present on each 32μs boundary to use as a Shift Register Full mark, it would stay locked onto the data stream and not lose any bits.

The analog disk hardware itself required only that the time between flux reversals should be no shorter than 4μs and no longer than 8μs.

FM coding's 1p1q1r1s nybble patterns meet these requirements. But so do others. In fact there are 34 nybble patterns that satisfy the Woz Machine's reading rules:
10101010*          11101010*
10101011*          11101011*

10101101           11101101
10101110*          11101110*
10101111*          11101111*

10110101  11010101 11110101
10110110  11010110 11110110
10110111  11010111 11110111

10111010* 11011010 11111010*
10111011* 11011011 11111011*

10111101  11011101 11111101
10111110* 11011110 11111110*
10111111* 11011111 11111111*

* valid FM-coded nybbles
Taking out the $D5 and $AA values used in the self-sync headers leaves 32 compliant nybble patterns and these can be used, along with some lookup table magic, to represent 5 bits of user data each. This is a form of Group Code Recording.

Each 200ms track can hold 6250 disk nybbles at 32μs each, and if each of those represents 5 group-coded data bits instead of 4 FM-coded bits, the raw track capacity grows by 25% from 3125 bytes to 3906. That's enough room for 13 256-byte sectors instead of 10, yielding a formatted capacity of 116,480 bytes - a substantial improvement over the competition's 89,600.

After the Disk II had been in production for a while, Woz found out that the 8μs spec for the maximum allowable gap between flux reversals was overly conservative. The original Shugart format would generate longer gaps than 8μs in the code violation byte employed for address mark sync. The head amplifier's automatic gain control still coped tolerably well until reversals occurred maybe 14-15μs apart; 12μs was no problem. So he tweaked the Woz Machine PROM a little to stop it drifting out of sync when faced with more than one missing RDDATA pulse, and designed an improved GCR scheme based on disk nybbles that could now contain up to two successive zero bits. It turns out there are 66 of those, and reserving the same $D5 and $AA values for headers leaves just enough to represent 6 bits of user data per disk nybble.

The new two-zeroes rule also allowed for a much shorter self-sync sequence. Instead of eight nybbles written with 36μs spacing, self-sync needed only four at 40μs before the $D5 $AA mark sequence: 11111111001111111100111111110011111111001101010110101010 will always read as FF FF FF FF D5 AA, FE FF FF FF D5 AA, FC FF FF FF D5 AA, F9 FE FF FF D5 AA, F3 FC FF FF D5 AA, E7 F9 FE FF D5 AA, CF F3 FC FF D5 AA or 9F E7 F9 FE D5 AA.

Now there was room for 16 sectors per track, raising Disk II capacity to 143,360 bytes.

The standard Apple DOS disk formatting code still used FM coding for information in the sector address mark portions of the track, basically because it was quick and easy to decode in software and short enough not to be worth optimizing. The sector-write code needed to "pre-nibbilize" 256 bytes of user data before locating the disk sector to write the coded version out to, and the sector-read code did a corresponding "de-nibbilize" pass after reading each sector in. These operations took enough time that neither Apple DOS, nor the various "fast" patches for it available from third parties (FDOS, DiversiDOS), nor Apple's later ProDOS system, could read or write physically sequential disk sectors at full disk speed; they all needed at least one sector of interleave, and reading or writing a full track took at least two disk revolutions.

Roland Gustafsson's RW18 fast loader took a rather different approach. Instead of using 16 sectors per track, each containing 256 bytes of user data coded as 342 disk nybbles, RW18 packed six sectors per track, each containing 768 bytes of user data coded as 1024 disk nybbles. There is just enough room on a track to fit this in (6 x 1024 = 6144, on a track with room for an absolute maximum of 6250) so RW18 didn't bother devoting space to the gaps needed for separately-writable address marks and data sectors - with RW18, formatting a track and writing it were the same operation, requiring exactly one disk revolution (200ms).

RW18 packs a staggering 161,280 bytes on a single-sided, single-density, 300rpm 35-track floppy disk - that's 180% of the capacity of the very same disk formatted according to Shugart's original spec, using flux-reversal timing constraints identical to those required by that spec. It reads data as fast as the mechanicals can possibly supply them: on a track read, it will start with whichever of the six sectors arrives under the read head first, then read the rest in a single disk revolution. The API offers scatter/gather reading and writing: you could pass it a list of 18 non-sequential 256-byte memory page addresses and it would fill those in sequential disk order.

RW18 uses the Woz Machine to its full potential, and is a comparable engineering achievement. I don't know if Roland and Woz have met. I hope so.

I no longer have my annotated disassembly listing of RW18, and I'd love to see the source code.
posted by flabdablet at 11:19 AM on April 20, 2012 [17 favorites]

I did not ever own any of this classic Apple hardware, but I am still digging flabdablet's posts about the evolution of the disk interface!
posted by jepler at 11:36 AM on April 20, 2012

Same here jepler!
posted by JHarris at 2:28 PM on April 20, 2012

Wired did a nice writeup on the project, which was a lot of fun to be part of. More history coming up!
posted by jscott at 10:07 PM on April 20, 2012 [1 favorite]

I've been in touch with Roland, who says he's doing his best to find his source code for RW18 and will open source it as a historical artifact.
posted by flabdablet at 10:23 PM on April 20, 2012 [2 favorites]

As an ex-][c user (gone but not forgotten; ][c for life), I gotta say this is about the most awesome FPP in the history of MetaFilter. The whole damn thread needs sidebarred!
posted by barnacles at 10:35 PM on April 20, 2012

« Older Gunplay   |   Messin with that next level organic cod. BOOM. Newer »

This thread has been archived and is closed to new comments