Simulating a TI calculator with crazy 11-bit opcodes
August 11, 2013 8:41 PM   Subscribe

"This realistic simulator of a 4-function Texas Instruments calculator from 1974 runs the calculator's source code instruction by instruction by simulating the processor. The unusual processor has 11-bit opcodes, 44-bit BCD registers, and a 9-bit address bus. To use the simulator, slowly click keys on the calculator image and you can watch how the calculator performs operations step by step. Since the processor doesn't do multiplication or division, it does these operations by repeated addition or subtraction."
posted by SpacemanStix (51 comments total) 36 users marked this as a favorite
 
Oh, this is groovy. This is a lot like the calculator my mother had when I was growing up, it's really interesting to see what was going on under the hood.
posted by nevercalm at 8:45 PM on August 11, 2013


I had exactly the opposite reaction. Apparently the past is interesting, if you didn't have to endure it firsthand.
posted by Greg_Ace at 9:08 PM on August 11, 2013 [1 favorite]


Wow that is really cool

The 8080 had no MUL or DIV either, but it did have SHR and SHL to shift bits right or left allowing you to quickly multiply or divide by two. Using those in combination with repeadted ADD or SUB you could write your own pretty fast MUL or DIV.

These tricks are still with us today in X86 assembly, the SAR and SAL instructions are specifically for arithmetic shifting. I don't know if compilers even use them.
posted by Ad hominem at 9:22 PM on August 11, 2013 [2 favorites]


I managed to lock it in an endless loop by multiplying two large(ish) numbers. I wonder if the real one did this?
posted by Confess, Fletch at 9:23 PM on August 11, 2013 [2 favorites]


5318008
posted by whyareyouatriangle at 9:24 PM on August 11, 2013 [17 favorites]


That is not a triangle.
posted by maryr at 9:32 PM on August 11, 2013 [1 favorite]


Good thing this didn't come out ten years ago, or I'd have to write this same program on my sweet-ass TI-84.
posted by cmoj at 9:45 PM on August 11, 2013


I'm disappointed by the lack of wood grain texture or avocado green elements on this webpage.
posted by crapmatic at 9:47 PM on August 11, 2013 [7 favorites]


This is very cool.
posted by ob1quixote at 9:50 PM on August 11, 2013


I remember my dad (an electrical engineer) buying his first scientific calculator in the early 70s. It really was a big deal, as amazing as it seems now.
posted by readyfreddy at 9:58 PM on August 11, 2013 [3 favorites]


Since the processor doesn't do multiplication or division, it does these operations by repeated addition or subtraction.

I had a math textbook in grade school that had a detailed explanation of how that's what multiplication and division really are. I'm totally not a math guy, so maybe it's an illustrative-but-somehow-not-completely-true explanation, but it always made sense to me.
posted by Mister Moofoo at 10:10 PM on August 11, 2013


Mister moofoo, That's what it is from a math perspective, but if a processor has a multiplication instruction, it can use a piece of hardware to do binary multiplication, which will be a lot faster than a bunch of additions.
posted by grandsham at 10:25 PM on August 11, 2013


Aha. Like I said, not a math guy. "Long multiplication", huh?

Makes airplane over head gesture.
posted by Mister Moofoo at 10:34 PM on August 11, 2013


So who's going to write Dope Wars or Doom for this?
posted by Ghostride The Whip at 10:40 PM on August 11, 2013 [4 favorites]


Mafia Wars, where I'm from. Also, yes please.
posted by lazaruslong at 10:49 PM on August 11, 2013


I managed to lock it in an endless loop by multiplying two large(ish) numbers. I wonder if the real one did this?

The explanatory text (down the page a bit) says, "The simulator will lock up in a few cases, such as overflow. I think these bugs are in the source code, not the simulator. The source code from the patent is slightly different from the code that was shipped in real calculators, so some bugs may have been fixed."
posted by flug at 11:05 PM on August 11, 2013 [1 favorite]


I can't wait for the Sinclair Scientific version, even though it will almost certainly be beyond me. There was so much clever around back then, when it was only just possible to build workable mass-market products around the technology and really smart design paid dividends.

Much as I adore the RasPi and Arduino movements, there's just no motivation to sit down and think through a resource-starved, IQ-boosted design any more. The last decadent flicker of that was (is?) the demo scene: I suppose it's in the nature of things that this has passed, and it's otiose to decry it, but I do miss that mixture of intellectual puzzle-solving and practical, useful, exciting results.
posted by Devonian at 11:15 PM on August 11, 2013 [7 favorites]


"The last decadent flicker of that was (is?) the demo scene: I suppose it's in the nature of things that this has passed, and it's otiose to decry it, but I do miss that mixture of intellectual puzzle-solving and practical, useful, exciting results."

I do, too, and I vividly remember first seeing Second Reality, marveling at the graphics, and thinking about all that assembly programming. Just its existence made me weirdly happy and content. And my dad was a mainframe programmer in the 70s and was known for his resource-efficient code.

The thing is, though, is all that effort and time spent working with low-level languages close to the metal in order to get as much performance out of the hardware as possible was time and effort not spent doing other things. Bootstrapping to higher levels of abstraction allow the programmer to work at a level that is closer to their intuition. This would just be a trade-off, of course, except that Moore's Law has meant that capital investment in R&D results in hardware improvements that are ultimately cheaper than programmer labor. So it makes much more sense to write code that's less efficient in terms of hardware resources but more efficient in terms of labor resources. Which is where we are today.
posted by Ivan Fyodorovich at 11:41 PM on August 11, 2013 [7 favorites]


There are still people working that close to the metal. I know a couple. It's where we get some of those extra bits of performance we squeeze out of consoles at the end of a cycle. Well, this cycle. Maybe not the next one.
posted by inpHilltr8r at 11:46 PM on August 11, 2013


Ah nostalgia. My grandfather gave me a calculator exactly like that when I was a kid. He had paid some insane amount of money for it when it first came out and I just needed a cheap calculator for school. It had a leather case, too. I loved the satisfying click of the buttons and those red numbers. I'm pretty sure I know exactly where it is in my parents' house now. I wonder how much it's worth.
posted by The World Famous at 11:53 PM on August 11, 2013


If you're looking to get into some low-level programming with limited resources, get a hold of an old Atari 2600 and join the homebrew scene (disclaimer: I'm not part of the scene, I've just had a look at the crazy things people do with it).
posted by Harald74 at 12:02 AM on August 12, 2013


I had a part-time job in 1972 at a place with engineers. I was in the main office one afternoon when a salesman came in trying to sell some 4-function electronic calculators, and they were quite expensive. Pretty sure it was over $100. The next year they were giving them away for opening a checking account.

And I know the title is a quote from the blog, but 11-bits is not crazy.
"...it allows all instructions to fit into a single 11-bit word, avoiding multi-byte instructions."
posted by MtDewd at 12:04 AM on August 12, 2013


It's things like this that sometimes make me wish I had been born twenty years later, so I'd have such things to learn from as a kid. Thanks for posting.

Ad hominem: "These tricks are still with us today in X86 assembly, the SAR and SAL instructions are specifically for arithmetic shifting. I don't know if compilers even use them."

GCC was definitely rewriting explicit integer div- and mul-by-2 into shifts a few years ago. Wouldn't be surprised if it still is.
posted by vanar sena at 12:51 AM on August 12, 2013 [2 favorites]


Badass simulation. Quite fascinating.

I'm glad I learned the word "otiose" today. Thanks Devonian!

Ivan's response to Devonian to me felt like one friend slapping another friend on the shoulder and providing reassurance that what is missed was a great thing, but has faded into relative obscurity for reasons that are not at all bad. Or perhaps there are actually more people doing low-level stuff now because their skills are still in demand, it's just that there are so many higher-level-language programmers out there that the bare-metal folks aren't as visible.

I'm glad there still are people working at the lowest possible levels tweaking hardware, shit, building the hardware, making it increasingly more efficient, or we'd be on our way to "Idiocracy" IMHO.

But, I'm also glad I don't have to deal with all of that; I like writing code that does basic things that automate tasks and make life easier, and if it weren't for the foundations built with lower-level-languages, I'd have to do everything the harder and slower way (for the types of tasks I'm dealing with) or do something else entirely. Naturally it's the "closest to the metal" folks building the logic gates and whatever-the-fuck to be faster-and-faster with less energy and less cycles that are enabling the [often sloppier because you can get away with it] high-level-language programmers to keep writing sloppy code because the hardware has improved so much that programmers can get away with horrible code.

I've bumped elbows with a lot of "assembly or death" folks in my life and I appreciate the zeal for doing super complex software in pure assembler, but don't appreciate the sentiment some give off that anything more complex than assembly language is the downfall of computers and mankind, that you're not a "real programmer" if you don't understand everything going on at the processor level, etc. But I'll let them identify with the word "programmer," as it doesn't carry all that much weight to me, and I think writing code is unsexy enough that I'd just prefer to say "I write code" rather than saying "I am this identity, this doer-of-things."

I've seen a lot of quality software written in pure assembly, especially during the heyday of the 386 and lower processors, which was definitely a time where it made sense to push yourself with assembly if you were up for it, to write full-blown applications. C and C++ were upticking quite a bit during the early 90's and I remember it becoming increasingly novel for people to write software completely in assembler, so those who did things like write bulletin board software or games entirely in assembly would make sure you knew it.

I stuck with Turbo Pascal and QuickBASIC (sorry) but found ways to integrate some of the speed-efficiencies of assembly in both languages, more easily Turbo Pascal. The most common reason to delve into it for me was to perform quick multiplication operations using bit-shifting to directly address video RAM and to page video RAM as quickly as possible. I don't remember how much of the original DOOM was written in assembly but blah blah, bedtime, this calculator is cool.
posted by lordaych at 1:01 AM on August 12, 2013 [1 favorite]


GCC was definitely rewriting explicit integer div- and mul-by-2 into shifts a few years ago. Wouldn't be surprised if it still is.

Thanks, seem like they may still do that and it may still be faster depending on how the instructions are pipelined.
posted by Ad hominem at 1:29 AM on August 12, 2013


A math teacher brought a calculator into our classroom in 1975. It was the size of a shoebox and we oohed and aaahed at its ability to multiply big numbers in front of our eyes. Now I ooh and ahh at the freedom of non-recorded association we had in those days.
posted by telstar at 2:04 AM on August 12, 2013 [2 favorites]


I'm kinda surprised that calculators like this were actually executing a series of explicit machine-code instruction - I guess I kinda assumed they were... I don't know, hardwired? With a single-purpose gate array, or something?
posted by Jimbob at 2:12 AM on August 12, 2013


Ad hominem: "Thanks, seem like they may still do that and it may still be faster depending on how the instructions are pipelined."

Yep I'm pretty sure it's faster if it's just one instruction each, but in more complex cases (eg multiply by 4) the shifts might be slower even if the sum of the raw cycle count is lower if the pipeline stalls. This kind of thing is why I have little interest in assembly programming on x86 now - trying to outguess the compiler on superscalar CPUs with crazy-long pipelines and NUMA is for special people.
posted by vanar sena at 2:41 AM on August 12, 2013


There were a variety of calculators at the time, with different implementation details. This design was patented, implying that other companies used (at least slightly) different designs.

Some of the very oldest arcade games there are are not playable in MAME, because they were implemented like you suggest, using discrete logic. There may well have been calculators at the time (maybe like telstar's shoebox adder) that added numbers with plain logic gates, but they are weird to emulate, because in a sense there's nothing to emulate; emulation is mostly the process of duplicating the instructions executed by a processor. Without a processor, what you're doing is more like recreating.

It should be remembered that, before solid-state calculators, there were electro-mechanical adding machines, like cash registers. It's not super difficult to add or subtract using mechanisms, and from there multiply using a process not dissimilar to how this calculator works, by repeatedly adds and subtracts. The Wikipedia page is interesting; Blaise Pascal invented an adding machine in 1642, but they didn't become manufactured industrially until the late 1800s.
posted by JHarris at 2:41 AM on August 12, 2013 [1 favorite]


It should be remembered that, before solid-state calculators, there were electro-mechanical adding machines, like cash registers. It's not super difficult to add or subtract using mechanisms, and from there multiply using a process not dissimilar to how this calculator works, by repeatedly adds and subtracts. The Wikipedia page is interesting; Blaise Pascal invented an adding machine in 1642, but they didn't become manufactured industrially until the late 1800s.

At the bank my parents used as a kid, they had a giant mechanical adding machine on the convenience counter where you fill out your deposit slip before standing in line. It Was Not A Toy!
posted by gjc at 4:15 AM on August 12, 2013 [2 favorites]


I bought that exact type of calculator in 1974. I remember paying $89.95, after watching the price come down in steps from around $150 and deciding that it wasn't going to go much lower.
posted by beagle at 4:19 AM on August 12, 2013 [2 favorites]


I got a lot of my early number sense by playing with a TI-1250 (I had to look that up just now but I recognized the picture like an old friend). That includes knowing what a lot of common fractions look like in decimal, knowing about how big a number you'll get if you multiply two big numbers, what multiplying or dividing by 10 does, recognizing perfect squares and things like that. There is oft-given advice that, if you want to be a good writer, you need to read lots. This generation of devices made it easier to do the mathematical equivalent, spending a lot of time just soaking in the language, and for me there was something intriguing and slightly mysterious about this literal black box that made me want to figure out what the hell was going on in a way that paper and pencil didn't.

Slightly later, there was a solar calculator (a Sharp, if I recall correctly) at my grandparents' house with a √ key. I did not know what that was for, and either nobody could tell me (which seems unlikely, but I don't know) or I didn't ask. It looked like a fancy check mark and I had a vague idea that it was for checking your work in some way. I learned pretty quickly that if you pressed it repeatedly the numbers would eventually go to 1. And I eventually hit on what it actually does - one of the first AHA! I GOT IT! THIS IS BRILLIANT. I'M BRILLIANT moments I can remember. In trying to explain this discovery, I also discovered the verbal difficulty that, I've subsequently learned, often goes along with inverse operations ("it's the thing that you multiply it by itself to get the other thing, you know, the one you started with, hang on, let me phrase that in the form of a proper sentence once I figure out what the correct subject is. I will be in touch with you soon.") And I began to learn that it can be quite difficult to convince anyone that your quite interesting thing you've spent a lot of time figuring out is interesting in any way at all.

Anyway, there were a lot of things I wondered about that this simulation would have answered for me and I would have found it fascinating and no less magical for seeing how it all works, which is the best kind of magic. I'm glad this exists for anyone else who has that curiosity.
posted by Wolfdog at 5:00 AM on August 12, 2013 [1 favorite]


The Visual 6502 project page linked from this one is amazing.
posted by double block and bleed at 5:16 AM on August 12, 2013


in more complex cases (eg multiply by 4)

Multiply by 4 is not a complex case. Shifting twice has the same cost as shifting once. Assuming that the compiler actually uses the shift operator to shift once. It could decide to to add the input to itself. Modern compilers will generally know what to do to best speed up integer operations with constants. Especially on x86.

For example, this Stack Overflow answer shows how you can speed up *5 on x86.

Not every processor has such well developed compilers though.
posted by inpHilltr8r at 6:49 AM on August 12, 2013


I suppose it's in the nature of things that this has passed, and it's otiose to decry it, but I do miss that mixture of intellectual puzzle-solving and practical, useful, exciting results.

The kind of people who were writing these tight, optimized assembler routines for low-end consumer electronics in the 70s are now putting those same inclinations to good use in fields like high-end graphics, scientific computing, scalable/high-availability web services, et cetera. The spirit is still alive, believe me.
posted by my favorite orange at 7:03 AM on August 12, 2013 [1 favorite]


Except the results of that spirit are less visible to the ordinary person. Looking at a byte-by-byte recreation of the process of a graphics card is less interesting, less accessible, to a lay person than that of an old calculator. It takes more of a grounding in theory just to broadly understand what is going on.
posted by JHarris at 7:20 AM on August 12, 2013


Except the results of that spirit are less visible to the ordinary person. Looking at a byte-by-byte recreation of the process of a graphics card is less interesting, less accessible, to a lay person than that of an old calculator. It takes more of a grounding in theory just to broadly understand what is going on.

Yeah, I agree. The cost of progress, I guess?

I'm really looking forward to 0x10c.
posted by my favorite orange at 7:29 AM on August 12, 2013


Having grown up in the era where microcomputers were slow as hell and resources were tight, it was an intellectual challenge to get a system to do *anything*. Flash forward 32 years and I will tell you that it is far more satisfying to write a chunk of code that has the right, beautiful level of abstraction that works well, than to spend the time necessary to destroy that abstraction to get something to run well.

Let me give you a for instance - these days I write (among other things) code to generate PDF documents and I have some code that can programmatically add annotations to pages, so naturally I wrote a test app that takes a JPEG image, decodes it and turns each pixel into a rectangle annotation of that color and places it on a page. At full resolution, my code (which is running in C#), can create add and save the annotations in under a minute. Acrobat can't open it. Seriously. It brings Acrobat (which is written in C and C++ - I know because I helped write it years ago) to its knees. My code can open the file back up in under a minute. When I resample the image to 1/8 resolution (which my code will do in under 10 seconds), Acrobat can finally open and render the page.

The thing that you get from a beautiful, appropriate abstraction is flexibility, maintainability, readability and a bunch of other goodies. This is the benefit of having better and better language abstractions.

Years ago, I wrote an implementation of Conway's Rules of Life. It was one of the last complete pieces of code I wrote for the Apple II series of computers. I've open-sourced it, so go ahead and read it. The notes in the comments have the date as 9/85, which means that I was 19. Among other things, you will see this gem:
2970 * NOTE THAT TO CHANGE A VERTICAL
2980 * COORD, IT MUST BE INCREMENTED
2990 * OR DECREMENTED BY 3. WHILE 3
3000 * INC OR DEC STATEMENTS ARE MORE
3010 * COMPACT THAN THE PROCESS OF
3020 * DOING AN ADD OR SUBTRACT, I
3030 * CHOSE TO USE THE ADDING FOR
3040 * VERTICAL COORDINATES SINCE IT
3050 * TAKES 12 CYCLES, WHILE THE 3
3060 * INC'S OR DEC'S TAKE 18.
And that's a pretty good picture into what it's like to code at that level - I made a conscious choice to trade 6 bytes for 6 cycles in a chunk of code executed in the inner loop.

Recently, I wrote an emulator framework in C# and have the trappings for 6502, 6800, 6809, and 8080. The latter, I recently started testing, emulating Space Invaders. Let me tell you, that processor is an effing dog. The available addressing modes are the bare minimum so that to do anything non-trivial, you have to burn registers in a bad way. 6502 is just as bad (although I could never stay mad at it).
posted by plinth at 7:49 AM on August 12, 2013 [2 favorites]


double block and bleed: The Visual 6502 project page linked from this one is amazing.
Previously on Metafilter
posted by JHarris at 7:50 AM on August 12, 2013 [1 favorite]


I implemented Life in 6502 assembly meself a long time ago plinth, although I didn't bother optimizing it like that, and it was a few years after you (I was working on Commodores some time past the expiration date). Good times, good times. Although I note that I'm working on a game-oriented cellular automation now too.
posted by JHarris at 7:54 AM on August 12, 2013


Embedded controls software engineers still have to deal with this kind of thing - I still spend time stepping through assembly code when debugging, but it's PPC code now instead of a 6800 derivative.
posted by rfs at 7:55 AM on August 12, 2013


inpHilltr8r: "Multiply by 4 is not a complex case. Shifting twice has the same cost as shifting once."

Sorry, you're right, I forgot shift takes a bitcount. I was trying to point out an example that took multiple simple instruction to replicate an arithmetic operation, specifically two or more mutually-dependent instructions.

Not every processor has such well developed compilers though.

I'm a compiler internals n00b, so bear with stupid questions: Aren't these the simplest kind of processor-specific optimizations? I would have figured that a "well developed" one like icc would do more aggressive CPU-specific vectorization and branch-prediction optimization.
posted by vanar sena at 7:58 AM on August 12, 2013


vanar sena, yep, I'd be disappointed with a compiler that didn't bit-shift constant factor multiplications by power of two. On a current processor architecture (a Core Duo), multiplication has a latency of 3 clock cycles as opposed to one for addition or bit-shifting. I doubt there's much point in trying to optimise it out.

But it's also 15 years since I took a compilers class, so don't listen to me if someone better comes along.
posted by ambrosen at 8:43 AM on August 12, 2013


The simulation isn't working on my computer at work (I think something is blocked). I had one much like this. It was a Bowmar calculator that my dad got me for high school graduation in 1973, as I planned to major in Math. It was protable, not a desktop, and in addition to the standard 4 functions, it also did PERCENTS!! I believe it cost $80 -- a fair amount of money then.

I ended up in computers (part of the math department then). I had to take a course in assembler, but I swore I'd never do that by choice.
posted by pbrim at 10:36 AM on August 12, 2013


Here's a quick oneliner that shows what gcc does for multiplication by small integer constants:
$ for n in {1..16}; do echo "times $n:"; gcc -xc - -S -o - -masm=intel -fno-asynchronous-unwind-tables \
  -O2 -march=corei7-avx <<<"int f(int x) { return x * $n; }" | perl -0777 -nE '/^f:$(.*)^\s*ret$/ms and say $1'; done
times 1:

        mov     eax, edi

times 2:

        lea     eax, [rdi+rdi]

times 3:

        lea     eax, [rdi+rdi*2]

times 4:

        lea     eax, [0+rdi*4]

times 5:

        lea     eax, [rdi+rdi*4]

times 6:

        lea     eax, [rdi+rdi*2]
        add     eax, eax

times 7:

        lea     eax, [0+rdi*8]
        sub     eax, edi

times 8:

        lea     eax, [0+rdi*8]

times 9:

        lea     eax, [rdi+rdi*8]

times 10:

        lea     eax, [rdi+rdi*4]
        add     eax, eax

times 11:

        lea     eax, [rdi+rdi*4]
        lea     eax, [rdi+rax*2]

times 12:

        lea     eax, [rdi+rdi*2]
        sal     eax, 2

times 13:

        lea     eax, [rdi+rdi*2]
        lea     eax, [rdi+rax*4]

times 14:

        mov     eax, edi
        mov     edx, 14
        imul    eax, edx

times 15:

        mov     eax, edi
        sal     eax, 4
        sub     eax, edi

times 16:

        mov     eax, edi
        sal     eax, 4
This is gcc 4.8.0 on x86_64 Linux, with the target architecture set to the second generation Core i7 (Sandy Bridge). Per the SysV ABI, the input is in edi, and the output is eax. (For the purposes of this function, the 64 bit registers rdi and rax are equivalent to edi and eax, as loading/storing a 32 bit register zero extends the upper half.)

An explicit shift is only used for x12, x15, and x16, however the multiplication in the address operand of the lea instruction is pretty much guaranteed to be a shift as well, given that the values allowed there are 1, 2, 4, and 8. If you're not familiar with lea, it stands for load effective address, and it's essentially a hack or cheat that utilizes the existing addressing machinery to get a three-operand arithmetic instruction. It has the same form as a mov instruction, but instead of actually reading or writing to memory, it computes the address that would have been read from or written to (the effective address) and puts that in the destination operand instead. Wikipedia has a nice summary of the available addressing modes, and you can see that this leads to a lot of flexibility.

It's also interesting that it chose to use add reg, reg instead of sal reg, 1 to implement x2. The add encoding is only 2 bytes, whereas the sal is 3 bytes, and I wouldn't be surprised if there was special case hardware in the CPU front end decoder that sees this pattern and turns it into a shift micro-op anyway, making it just as fast and consuming less i-cache. Even if it actually does an add, Sandy Bridge has three ALUs per core each with single cycle latency and throughput.

In this short sample the only one that required actual multiplication was x14. I suppose we can infer from that that it judged lea+sub+add (the existing x7 pattern times two) to be too long compared to the multiply.

You can modify the oneliner to extend the series to higher numbers, but what I think is much more interesting is how the compiler is able to avoid division by an integer constant by turning it into multiplication, which is much faster. For example, if you look at the output of computing x / 50, you see this:
        mov     eax, edi
        mov     edx, 1374389535
        sar     edi, 31
        imul    edx
        sar     edx, 4
        sub     edx, edi
        mov     eax, edx
If you work through that, it's computing (x * 1374389535) >> 36. The instruction imul r32 stores the 64 bit result of eax * r32 in edx:eax, so edx need only be shifted right by 4 to achieve an effective 36 bit shift. 1374389535 / 236 is 0.020000000004074536263942718505859375, a value close enough to the reciprocal of 50 for the purposes of integer division. There is a final step to add 1 if x was negative, e.g. (-250 * 1374389535) >> 36 is -6 due to two's complement representation.
posted by Rhomboid at 11:55 AM on August 12, 2013 [9 favorites]


That's really cool, Rhomboid. It looks to me that a lot of these were chosen by the compiler to deal with the ABI imposition of your function call return. If the input had already been in eax, the choices would have been different in many of the cases.

This just further reinforces my belief that most of us should avoid assembly though, as few of us would have the patience to out-optimize the compiler in even these trivial cases.
posted by vanar sena at 12:39 PM on August 12, 2013


"At the bank my parents used as a kid, they had a giant mechanical adding machine on the convenience counter where you fill out your deposit slip before standing in line. It Was Not A Toy!"

It was for me. My grandfather and my mother were both bankers — when I was a child (up to the age of ten, when he died) my grandfather was the CEO and President (he held both titles) of the largest bank in the state. So my grandparents tended to have some business machinery at home, like a mechanical adding machine and good typewriters. And my mom for most of my childhood was the campus (tiny) branch manager of a bank in my small university town. I'd sometimes go visit her and play with stuff. I played with the adding machine all the time because it was just So Damn Cool.

It was also kind of fun to hold the big bag of money in the car that my mom would take back to the main bank when she closed the campus branch down at 2:30. It was a small town, we didn't worry about being robbed. It was tens of thousands of dollars, though.

Anyway, I still have a vivid tactile memory of pushing in the number keys on the adding machine, as well as the sound and the rhythm of it. The entry and operations forced a user interface that was a direct result of how the machinery worked. And that became the convention for ten-key calculators used in business that replaced the adding machines. It's always amusing to come across someone who's not familiar with a ten-key. I worked in retail for a time and a guy returned a ten-key calculator insisting that it was broken.

Playing with Selectric typewriters and swapping out different typeface balls was fun, too.
posted by Ivan Fyodorovich at 12:40 PM on August 12, 2013 [1 favorite]


If the input had already been in eax, the choices would have been different in many of the cases.

The sample code can be modified to accommodate that. Instead of a function that returns the new value, assemble a function that calls another function with the modified value, such that the destination register is the same as the source register.
$ for n in {2..16}; do echo "times $n:"; gcc -xc - -S -o - -masm=intel -fno-asynchronous-unwind-tables \
  -O2 -march=corei7-avx <<<"int g(int); int f(int x) { g(x * $n); }" | perl -0777 -nE '/^f:$(.*)^\s*jmp\s+g$/ms and say $1'; done
times 2:

        add     edi, edi

times 3:

        lea     edi, [rdi+rdi*2]

times 4:

        sal     edi, 2

times 5:

        lea     edi, [rdi+rdi*4]

times 6:

        lea     edi, [rdi+rdi*2]
        add     edi, edi

times 7:

        lea     eax, [0+rdi*8]
        sub     eax, edi
        mov     edi, eax

times 8:

        sal     edi, 3

times 9:

        lea     edi, [rdi+rdi*8]

times 10:

        lea     edi, [rdi+rdi*4]
        add     edi, edi

times 11:

        lea     eax, [rdi+rdi*4]
        lea     edi, [rdi+rax*2]

times 12:

        lea     edi, [rdi+rdi*2]
        sal     edi, 2

times 13:

        lea     eax, [rdi+rdi*2]
        lea     edi, [rdi+rax*4]

times 14:

        mov     eax, 14
        imul    edi, eax

times 15:

        mov     eax, edi
        sal     eax, 4
        sub     eax, edi
        mov     edi, eax

times 16:

        sal     edi, 4
x2 changed from lea to add (not sal, because that's one byte longer), and x4 and x8 changed from lea to sal, but that was it except for a couple of insignificant moves.
posted by Rhomboid at 1:13 PM on August 12, 2013


Many years ago I fell in love with a simulator a lot like this one by Eric Smith for the HP-45, a fully-featured scientific calculator, and HP's second handheld model. Ashley Feniello has a similar one for the HP-35, their first.

Given that these machines also didn't have a hardware multiply or divide, it's quite a puzzle to figure out how to get the exponential function or logarithms or trig running in any reasonable time. Going at it with the usual power series would be way too slow.
But these machines use a very clever algorithm which manages it in just the time of two multiplications, more or less. (It appears in a patent of An Wang; Jacques Laporte has more.)

Here's the gist of the algorithm for log. These calculators worked in (binary-coded) decimal scientific notation, and it's easy to dispatch the exponent and assume we want the log of a number close to 1, say in (0.1, 1]. The key is to pass through a representation in the form
(2)^a0 * (1.1)^a1 * (1.01)^a2 * (1.001)^a3 * ...
where in this case all the ai are nonpositive integers.

Getting a number into this form is easy: it's a matter of repeatedly multiplying by 2 until you can't do that without exceeding 1, then repeatedly multiplying by 1.1 until ditto, then repeatedly... And multiplying by 1 + 10-n just means shifting right n places then adding.

Getting the logarithm of a number in this form is easy: it's an integer linear combination of ln(2), ln(1.1), etc, which is just addition. You could put those logs in a table and be done; but by exploiting the structure of the Taylor expansions you don't even need to store that many constants, which is useful under ROM size limitations.

For the exponential function, just reverse this, except take the ai to be nonnegative so that you can again multiply by 1 + 10-n instead of having to divide.
For trig, a similar thing works except instead of using multiplication by 1 + 10-n, we use rotation by an angle whose tangent is 10-n, since up to a global scalar this rotation has a matrix whose entries are all 1 or ±10-n.

A few refinements: if you do these steps in such an order that the number you're working on is always as close to 1 as possible, then instead of working with it directly you can work with its difference from 1 in scientific notation, allowing you more digits of accuracy in intermediate workings.
And it's pleasant that all the ai are less than 10 (in absolute value), so you can save space by storing them as digits of a number... and, when you get to the 1 + 10-n term where 10-2n is smaller than the epsilon of your precision, the digits you need to record are just the digits of your remainder, so you can skip half the iterations of the outer loop.

The first release of the HP-35 had a numerical bug in ex, caused by missing a necessary truncation at the end of the above shortcut. Feniello's simulator faithfully reproduces this: try computing e(ln 2.02).
(When this bug was discovered, Dave Packard was said to have gone pencil-snappingly apoplectic at the suggestion of one of his engineers that they stay mum; the only thing for it was to offer to replace any affected calculator by a corrected one. Apparently only a quarter of customers took HP up on it, though.)
posted by finka at 4:54 PM on August 12, 2013 [2 favorites]


GCC was definitely rewriting explicit integer div- and mul-by-2 into shifts a few years ago. Wouldn't be surprised if it still is.

yeah, a shift is always going to be the fastest way to divide or multiply. for division, when the denominator is known at compile time then a combination of multiplications and shifts can produce the same answer (using a funky technique that depends on high-order bits overflowing away). if you're doing a lot of divisions at runtime that share a denominator, you can use the same technique to speed things up by using libdivide, which allows you to pre-compute the relevant constants and then get your fast division on.
posted by russm at 5:21 PM on August 12, 2013


Much as I adore the RasPi and Arduino movements, there's just no motivation to sit down and think through a resource-starved, IQ-boosted design any more.

Yes, there very much is. Thanks to NP problems (and their even harder big brothers), there will always be things that need to be computed that are juuuust out of reach of any computer we have (including quantum computers, if those ever end up existing). For these things, you need to think hard about how to get it done.

For instance, this week I turned a 6 day computation into a 3 hour one by *removing* parallel processors. Know your problem. Think. Implement well.
posted by DU at 6:59 AM on August 28, 2013


« Older 19 y/o could change the world (SLNBC)   |   More Than Just Books Newer »


This thread has been archived and is closed to new comments