Debugging
March 17, 2013 8:20 AM   Subscribe

Every programmer knows that debugging is hard. Great debuggers, though, can make the job look simple. " That attitude is illustrated in an anecdote from IBM's Yorktown Heights Research Center. A programmer had recently installed a new workstation. All was fine when he was sitting down, but he couldn't log in to the system when he was standing up. That behavior was one hundred percent repeatable: he could always log in when sitting and never when standing."

"Most of us just sit back and marvel at such a story. How could that workstation know whether the poor guy was sitting or standing? Good debuggers, though, know that there has to be a reason. Electrical theories are the easiest to hypothesize. Was there a loose wire under the carpet, or problems with static electricity? But electrical problems are rarely one-hundred-percent consistent. An alert colleague finally asked the right question: how did the programmer log in when he was sitting and when he was standing? Hold your hands out and try it yourself.

The problem was in the keyboard: the tops of two keys were switched. When the programmer was seated he was a touch typist and the problem went unnoticed, but when he stood he was led astray by hunting and pecking. With this hint and a convenient screwdriver, the expert debugger swapped the two wandering keytops and all was well. ""
posted by amitai (100 comments total) 102 users marked this as a favorite
 
Stories like this always remind me of The Case of the 500-Mile Email.
posted by mph at 8:34 AM on March 17, 2013 [78 favorites]


I struggle with calling hardware fixes "debugging". He should have remapped the keys and left the tops in place. Problem solved!
posted by blue_beetle at 8:34 AM on March 17, 2013 [3 favorites]


He should have remapped the keys and left the tops in place. Problem solved!

Problem reversed, rather.
posted by Huck500 at 8:48 AM on March 17, 2013 [14 favorites]


The More Magic Switch.
posted by Mitheral at 8:48 AM on March 17, 2013 [5 favorites]


I struggle with calling hardware fixes "debugging".

Isn't calling a hardware fix "debugging" more etymologically correct, though?
posted by mph at 8:48 AM on March 17, 2013 [28 favorites]


Isn't calling a hardware fix "debugging" more etymologically correct, though?

Entomologically too!
posted by Blue Jello Elf at 8:52 AM on March 17, 2013 [65 favorites]


Isn't calling a hardware fix "debugging" more etymologically correct

Just calling it "bugging" would be more entomologically correct.
posted by Mrs. Pterodactyl at 8:53 AM on March 17, 2013


Isn't the origin of the term bug in hardware anyway? Wasn't there a physical bug that was causing a problem that led to the creation of the term?
posted by Rubbstone at 9:01 AM on March 17, 2013


The magic analogy is an interesting one. In my experience bugs tend to be either simple "oh, whoops" or full on head-tilt impossibility. I was definitely into deciphering magical effects as a kid, and I enjoy debugging now (well, unless it's someone else's closed source).
posted by lucidium at 9:07 AM on March 17, 2013 [1 favorite]


I first took the "Great Debuggers" part of this story to refer not to a person but to a software debugger, something you can set breakpoints in and stuff. I've been programming for 20+ years now and I still don't consistently use a debugger. I learned gdb in intimate detail and I still did printf(). I learned jdb and a bit of various Java IDEs and I still used log4j. Python has great debuggers, and for awhile I used WingIDE to step through my code and inspect local variables and yet still, I use print statements. Now in Javascript I try to convince myself to use the excellent Chrome developer tools and yet there I am, throwing in calls to console.log() (hey, at least it's not alert()).

Most other programmers I know also bust out the println() debugging with regularity. There's a lot of lore about why fancy debuggers are bad things, from Brian Kernighan in 1978 to a great Linus Torvalds rant in 2000 to hordes of nerds today. The primary argument is that the debugger makes it "too easy", it's better to use a clumsy tool and be forced to stop, and think, and rationalize the problem. I sort of agree with that although frankly it's a stupid argument. I think for me it's partly about reproducibility; those logging print statements stay in my code forever, I can re-examine the traces any time, whereas the debugger setup within an IDE is ephemeral.
posted by Nelson at 9:07 AM on March 17, 2013 [17 favorites]


/contemplates writing a story called "UNABLE TO REPRODUCE: Test and Dev, a Love Story. - it'd be the West Side Story of our times!
posted by Artw at 9:07 AM on March 17, 2013 [19 favorites]


History of computer bugs.

search down for the word 'bug' and you'll get this site's account of the first computer bug, including a photograph of the same.
posted by YAMWAK at 9:10 AM on March 17, 2013 [1 favorite]


Problem solving.

A skill so rare it needs fancy names.

Its all in the way you approach an apparently odd problem and learn to ask the right questions.

Experience helps.
posted by infini at 9:14 AM on March 17, 2013 [8 favorites]


I frequently find myself in situations where I need to type something (entering a password, for instance) while someone else is looking at my screen, and finding myself suddenly completely bereft of typing skill because the keyboard is placed at a 5 degree angle from its normal position.
posted by deathpanels at 9:14 AM on March 17, 2013 [2 favorites]


Debugging when there's only one thing to fix is hard enough. Now try debugging after you realize there was one colossal failure that has now caused tons of permutations of the same error, and there's no way to stuff the original genie back in the bottle.

Video game based on the Matrix trilogy. There were 30,000 voiceover files, translated into EFIGS, Russian, Portuguese, +CKJ. But the sound designer mislabeled a chunk of the source files in one fell swoop. So the localization testers started reporting that Morpheus in German sounded like Neo in Italian...

Needless to say, I speak and read only English. There was a lot of me walking back and forth to where the localization guys were, saying, "Bring me the Russian." And then this weary guy and I would walk back to my desk, I'd play him the sound, and he'd go, "That's Agent Smith saying XYZ." And I'd replace it with Agent Smith saying ABC. Over. And over. Again. Cursing the sound designer and his entire family.

For the record, Agent Smith in German sounds creepy as shit. "Herr Anderson..."
posted by Cool Papa Bell at 9:15 AM on March 17, 2013 [15 favorites]


I always suspected that anecdote was early-computing urban legend, myself.

Programming Pearls is still a terrific book, though.
posted by We had a deal, Kyle at 9:17 AM on March 17, 2013 [3 favorites]


My best debugging story involves a slow build process being traced back to a bad power supply on a hub in an unused office.
posted by DU at 9:27 AM on March 17, 2013 [2 favorites]


Debugging when there's only one thing to fix is hard enough. Now try debugging after you realize there was one colossal failure that has now caused tons of permutations of the same error, and there's no way to stuff the original genie back in the bottle.

Now try doing it while Walmart's authentication servers for their employee portal are down, and you're on a conference call with a war room in Bentonville, expected to find a fix or mitigating actions while reps from Microsoft and Sun are sitting there saying it's not their problem.
posted by fatbird at 9:30 AM on March 17, 2013 [2 favorites]


The problem wasn't the keyboard but the person. As always. Computers are literal, they do what told. This example was interesting for its kinetic aspect.
posted by stbalbach at 9:31 AM on March 17, 2013 [1 favorite]


I started out life as an engineering contractor (I may be glossing over a few years, say 2 decades, prior to that). There was one other contractor that worked with me, Doyle. Doyle had been a contractor long enough that he'd done design with vacuum tubes. One day he comes up to me and a friend complaining about "The oddest damned thing." His space bar had a mind of it's own. He'd type and all of a sudden the space bar would start asserting itself. He showed us. Sure enough, he's typing into EMACs and there's a string of spaces. Doyle didn't realize that his pendulous gut was actually resting on his space bar.
posted by substrate at 9:33 AM on March 17, 2013 [14 favorites]


I think for me it's partly about reproducibility; those logging print statements stay in my code forever, I can re-examine the traces any time, whereas the debugger setup within an IDE is ephemeral.

I almost always take my debugging statements out. (We have Official Debugging Statements that stay in, but my ephemeral ones are removed.) The two main reasons I like to use statements rather than debuggers are:

1) Reduction of variables. I don't think running the program inside another layer is going to make anything any simpler. It just adds a whole bunch of new potential problems. The environment is completely changed now. Print statements are just more of the same kind of code we already have.

2) Repetition of behavior. I want to test a lot of ideas quickly. Running in a debugger is a matter of manipulation of program and debugger, which takes time. I want to just instrument my code and then run it over and over and over, often dozens of times in a minute, and very quickly narrow down the possibilities.
posted by DU at 9:35 AM on March 17, 2013 [4 favorites]


The problem wasn't the keyboard but the person. As always. Computers are literal, they do what told.

You've never debugged a race condition or pointer problem.
posted by DU at 9:36 AM on March 17, 2013 [11 favorites]


After over 20 years in IT, I've come to the conclusion that people who can quickly arrive at non-obvious solutions to strange problems like this are rare, but that it is a skill that can be taught to those who weren't born with it. I've actually got a book on the subject outlined - one of these days I'll actually get around to shopping the proposal and writing it.
posted by deadmessenger at 9:36 AM on March 17, 2013 [4 favorites]


The term "bug" is older than computers. The famous moth was labeled "First actual case of bug being found." (Emphasis added.)
posted by ChurchHatesTucker at 9:44 AM on March 17, 2013 [1 favorite]


LOL here's my own strange little debugging story, which oddly enough involves an error when an Apple II would only boot into the debugger.
posted by charlie don't surf at 9:45 AM on March 17, 2013 [13 favorites]


I think for me it's partly about reproducibility; those logging print statements stay in my code forever, I can re-examine the traces any time, whereas the debugger setup within an IDE is ephemeral.

I can't stand it when colleagues commit code with commented-out situational logging statements in for 'debugging' purposes, and when they give me grief for deleting them the next time I commit the code. That stuff is noise; written thoughtlessly and planlessly for the needs of one brain-fart moment and the idea that it's better to leave this crap in against some hypothetical future need is nonsense. If you need a stupid one-liner log statement in that spot someday, take the five damn seconds to type it in again. And don't commit it then, either.
posted by George_Spiggott at 9:47 AM on March 17, 2013 [11 favorites]


In debugging, knowing is 99% of the battle.
posted by Foosnark at 9:51 AM on March 17, 2013


Actually, the secret to debugging is to not just check the things you don't know the answers to. You mostly need to check the things you already (think you) know. If everything you thought was true was actually true, you wouldn't have a bug. Therefore something you think is true, isn't. Check them.

The best aha! moments I've had while debugging were just random stabs at checking things and realizing an assumption was false.
posted by DU at 10:07 AM on March 17, 2013 [24 favorites]


I work in an office that is testing out a new software platform that is really buggy. We use it for our daily work and provide feedback to the programmers so they can fine tune it. I have a little bit of experience with programming, and have a passable knowledge of programming logic and computers in general, so I have become the filter through which my office's feedback passes before being sent to the programmers.

Holy hell, is that a thankless job.

On one side my coworkers, who in fairness were not hired to do a job in which they have to think in these terms, get angry when they report a problem and I ask them if they can recreate it, and if not, can they show me exactly what they were doing when it occurred. They don't want to do that. They don't care about the details. They don't want to understand hows and whys. They aren't interested in logic, or even efficiency. They only want to know a specific series of buttons they have to press to get their jobs done, and when it doesn't work like they think it should they want to report that it is broken. Troubleshooting and debugging must seem like magic to someone who, for example, couldn't log in to one website in the morning and now can't log into another website in the afternoon, so reaches the conclusion that "my computer is having a problem sending passwords."

On the other side, it's a rare programmer who will entertain a report from a lowly end user saying that it appears that the data is getting corrupted at this specific point in process X, and once that happens the corrupted data propagates in processes Y and Z, causing these specific results. It's as if they can't function at all without the 80% noise in their signal, to which they can respond that it was user error.
posted by Balonious Assault at 10:09 AM on March 17, 2013 [5 favorites]


Oh, hey, favorite personal debugging story, I guess:
I worked in a high school where the network dropped out every day at precisely 1:15, just as the teachers were trying to get their 4th period attendance entered on their computers. There's a lot of story I can skip to get to the part where me and the district network guys, who'd been quietly ignoring our calls about the issue for a whole semester, found ourselves standing in the room where the switch that serviced the second floor lived and realized it was plugged into a $5 power strip that was connected to a chintzy white lamp extension cord that plugged into a utility outlet on the side of a room-sized air handler that cycled every day at 1:15.
posted by mph at 10:11 AM on March 17, 2013 [9 favorites]


At my first job I was told a story of a difficult debug: their system (a logistics software package for shipping that integrated conveyors, scanners, label printers, etc) wasn't working at a customer's location (a warehouse).

It worked everywhere else. Same software, same hardware. They tried replacing every piece of hardware with a known working piece of hardware. Still nothing.

It turned out that the lights in the warehouse were blinding the barcode scanner.
posted by justkevin at 10:12 AM on March 17, 2013 [4 favorites]


After over 20 years in IT, I've come to the conclusion that people who can quickly arrive at non-obvious solutions to strange problems like this are rare, but that it is a skill that can be taught to those who weren't born with it.

Charles and Ray Eames wrote:

"One of the most valuable functions of a good industrial designer today is to ask the right questions of those concerned so that they become freshly involved and seek a solution themselves."
posted by infini at 10:14 AM on March 17, 2013 [6 favorites]


In my work, we had a worker thread that kept getting poisoned at about 2 PM on Saturdays. This thread—always a single thread, but it could be any of the workers—would then fail every other request served by that thread afterward. After much searching and reading of logs, we isolated it down to one function call that was poisoning the thread. It was a scheduled job, a systems interop task that happened exactly once per week. It ran fine for months, until the system that it called was actually shut down, and the job was never stopped.

Ultimately, after hours of work on this, we figured out that at some point in the distant past, someone wrote this Python code:

except DBError, Exception:

They were trying to express this idiom:

except (DBError, Exception):

Which means "One of these types of errors." What actually happened, as written, was that the specific instance was overwritten into Exception, and attempts by that thread to raise an exception at any point (or, in fact, to do anything with the Exception class at all) raised an error... which itself failed, because all errors are exceptions!

It was a brutal bug.
posted by sonic meat machine at 10:18 AM on March 17, 2013 [30 favorites]


On one side my coworkers, who in fairness were not hired to do a job in which they have to think in these terms, get angry when they report a problem and I ask them if they can recreate it, and if not, can they show me exactly what they were doing when it occurred. They don't want to do that.

Oh god. I had a friend who would call me for tech support and he'd say, "I can't get the whatchamacallit into the thingummy." No those are not euphemisms, he would actually use the words "whatchamacallit" and "thingummy" and he expected me to know what that meant. He said "well you know how I work, you're supoosed to know what I mean." Jeez, I never even heard the word "thingummy" until he used it.

Tech support of end users is almost always fixing PEBKAC. It takes 95% of the effort to get the user to specify the problem in a way that is useful for debugging. It takes about 5% of the effort (and often, one Google search) to solve it, once you know what the actual problem IS. The problem is never about finding the right answers, it's all about finding the right questions.
posted by charlie don't surf at 10:18 AM on March 17, 2013 [6 favorites]


Nthing the sentiments about the importance of questioning assumptions and of asking the right questions. The problem with turbo-charged IDE debuggers is precisely that of languages like BASIC, it encourages fiddling rather than thinking and reactive hacks rather than design. I cut my teeth with punch cards, 90 minute turn round because I was privileged and a pain to correct mistakes. And yes, it did encourage a "don't put bugs in your code in the first place" mindset.

The hostility towards leaving debugging statements in place but commented out is a sure sign of a bad programmer to me, someone whose OCD blinds them to practicality. They don't hurt and indicate points where there was once an issue. I once saw a large assembler program, thousands of statements long, it had relatively few comments, but one of the few said "watch this bit, it's tricky", guess where the bug was?
posted by epo at 10:25 AM on March 17, 2013 [5 favorites]


Oh and if we are talking about debugging in the tech support, person-on-the-phone sense, then the secret there is: You don't have to explain what is happening. You have to explain what the person is saying.

I had one recently where the remote "admin" (I use the term loosely in her case) kept telling me a command would work in one terminal window but not in the other. After several days of having her dump environments, check permissions, etc I finally realized she was just wrong. It didn't work in either window. She hadn't actually checked the result, she just didn't get any errors because she'd been running it wrong.

The only way I cracked it was by asking myself why she would think it worked and realized she was too dumb to know if it really did. After that, it was easy.
posted by DU at 10:27 AM on March 17, 2013 [1 favorite]


I can't stand it when colleagues commit code with commented-out situational logging statements in for 'debugging' purposes, and when they give me grief for deleting them the next time I commit the code. That stuff is noise; …

Or worse, if they leave all of their debugging logging statements active and in the code. The smallest action spews pages of old, unused information, making any logging you've (temporarily) added a needle in a haystack.
posted by JiBB at 10:27 AM on March 17, 2013 [1 favorite]


They don't hurt and indicate points where there was once an issue. I once saw a large assembler program, thousands of statements long, it had relatively few comments, but one of the few said "watch this bit, it's tricky" guess where the bug was?

They only don't hurt if you have infinite screen real estate. Also, you just gave an excellent reason to have comments but left unaddressed why you'd want to have commented-out debug code.
posted by DU at 10:29 AM on March 17, 2013 [2 favorites]


I once worked at a company that used Visual SourceSafe for revision control. Every now and again it would lock up and we'd get a corrupted file in the repository.

After I went to the VSS server and uncoiled several loops of the Ethernet cable from around the power strip, the problem went away.
posted by RobotVoodooPower at 10:29 AM on March 17, 2013 [3 favorites]


The only way I cracked it was by asking myself why she would think it worked

That is brilliant. I wish I had thought of that method about 30 years ago, it would have saved me a lot of effort. I probably used a variant of this sometimes. I often joke that the opposite of a random failure is a "random success." Just because it worked, doesn't mean that your success was anything beyond a random accident. You could have just been lucky and it randomly worked several times in a row, when it should have failed.

There's an old hacker's saying, "an algorithm is any trick that works repeatably." The problem is, users have a very low threshold of repeatability. They don't understand why something works, let alone what factors make it repeatable. So when they change something, they don't realize what they broke.

Anyway, I'll cite one of my favorite essays, about debugging on the IBM/360 back in the 1970s. It is my first publication of my hypothesis, "The Law of Infinite Stupidity."
posted by charlie don't surf at 10:37 AM on March 17, 2013 [4 favorites]


So this is something I know a lot about from having spent thousands of hours doing it.

Yes, it's called debugging whether it's hardware or software. In fact, yesterday evening I finished debugging an issue that I thought for several days was hardware, but turned out to be software.

Debugging isn't really "the process of fixing bugs". Here's a list of open bugs in my current project - casting an eye down them I only see one or two that I think will need actual debugging, and even then I suspect that I'll look at the code and see it immediately.

All the other bugs were problems that I identified in testing, but immediately know what the cause is. Heck, in most cases on this page I'd say that this was expected behavior, but behavior that turned out to actually be undesirable in the finished product.

No, debugging is what happens when you have a bug and you don't know what causes it and you can't fix it.

Sometimes you can fix a bug without knowing the cause. Sometimes this is good - for example, you're receiving bad data from an external source, and you fix your code so that it won't crash on bad data - you don't need to know why you're getting bad data today, because you might get bad data some other day for some other reason from some other source - you need to "harden" your code. But most of the time, a fix to a bug without understanding the root cause is bad - called a "bandage" - because you have no guarantee that the problem can't reappear elsewhere, or in a different form.

The number one weapon in debugging is the personal conviction that you can and will solve the issue - when you are stuck, you will come up with a new strategy. It's very easy to despair, and believe that some magic is occurring that you will never track down. Yes, I've been there many many times.

And the number one cause of debugging issues is mistaken assumptions. For example, in the hardware debugging issue above, I assumed that because I'm not very good at making hardware and good at software, the error had to be in the hardware. Only when I rebuilt and retested the whole thing using completely different parts did I start to wonder about the software end (for the record, the "hardware service" was "blacklisted" in a config file owned by the system - so now I can detect that this is an issue and report it to the user...)

> I can't stand it when colleagues commit code with commented-out situational logging statements in for 'debugging' purposes, and when they give me grief for deleting them the next time I commit the code.

I'm totally on your side - and so has been every serious shop I've worked in. You ABSOLUTELY SHOULD NOT do conditional logging by commenting things out! Any possible programming language you can work in has a library that allows you to have logging levels that you can switch either at compilation time or at run time. If you're doing anything above a toy, you should do this as one of your very first steps. (From the codebase above, here's my first try at a logging module - basically just a wrapper around Python's logging - and here's today's, almost twice as long (and, oops, I see I checked in something commented out I should have deleted, but I know that that's a mistake! :-D)

Here's a rule for you - "It's not the compiler". In other words, if you're tempted to believe it's something that the compiler is doing wrong, then you are wrong. And I say that having worked with someone who did find several compiler bugs stemming from people's code but this was the go-to super-level C++ debugger in a huge coding organization so he was the one seeing the most obscure bugs that no one else could solve...

Blaming the compiler (or interpreter or language) is again like assuming magic (unless you're working in PHP, where it's quite likely you have fallen into one of the several hundred known traps in the language). Assume it's your problem and start to work down from there to find it, to create a very small program that demonstrates the issue. IF you have a language bug, that will demonstrate it - but nearly all the time, you'll realize that it's due to your lack of understanding of the language.

Worst debugging story: this didn't happen to me but to a boss of mine, in the 70s. He was working on some huge Wall Street firm's back office program, and they wanted to switch machines to a supposedly-identical new machine from the same company. They wrote a huge test suite of increasing difficulty and ran it on both machines. As I recall it, they compared the results by hand(!) - today it seems insane but either I misremember or there was no "diff" program available for the special-purpose machine - and they found a small discrepancy in the last page.

The manufacturer swore by their machines - so they took the whole team and basically stayed at work for an entire weekend and by using a binary search, slowly boiled the whole thing down into a (dense) page of code that had different results on both machines. The manufacturer had to accept that, went back into their workshop - and two weeks later admitted that they had found a bug in the microcode in their new machine... ("men program in assembly, real men program in microcode.")

Funniest debugging story did happen to me. I was working on the operating system of an early pocket computer in the 80s (yes! though it never came out, we got it working fine, the investors simply decided not to go on with it). In the early stages, it was just me and a tremendously pot-head hardware engineer.

One night, I'd finally finished a 0.1 rev of the software, and he the hardware. So we turned the machine on - and to my shock we did see the first page with the soft buttons and everything. (Pat on the back there, young lupus! 'cause we were doing it by hand - I'd written the font rendering myself...)

Knowing my luck wouldn't hold, I pressed a button... the screen whirred for a second, then cleared...

And it was showing a very coherent picture, a clock. A running clock that was scrolling biblical verses across the bottom and counting up the number of abortions since Roe vs. Wade.

We sat there for perhaps a minute. This seemed impossible. We had personally just burned the EPROM with the code in it ourselves. I had personally written nearly all the C code, I knew every routine in it, the compile could not have hidden anything this fancy.

Then we started to dig through data books. In a half an hour, I was staring at the very last page of the dev. manual for our display prototyping board - which had several undocumented vectors (jump table addresses) including one called RTLCLOCK. I'd glanced at it, read it as Real Time (something) Clock and ignored it, because we weren't using any of their vectors at all - but now I realized that RTLCLOCK was Right-To-Life Clock, and the company making the display prototyper was "Heritage Computing"...
posted by lupus_yonderboy at 10:37 AM on March 17, 2013 [54 favorites]


@DU: I said "they indicate points where there was once an issue" and provided anecdata, thought I did address it. Debug statements indicate places which might still need attention and also places where bugs might have been introduced. This is especially important in team environments where multiple people meddle with the code.

The real problem with debug statements is that live ones alter program timing and usually change the behaviour of race conditions.
posted by epo at 10:39 AM on March 17, 2013


What actually happened, as written, was that the specific instance was overwritten into Exception, and attempts by that thread to raise an exception at any point (or, in fact, to do anything with the Exception class at all) raised an error... which itself failed, because all errors are exceptions!

I've had similar issues in Python with my own code, but not quite as difficult.

Pro-tip: if you're using Python, for the love of all that is holy, use pylint or a similar static analysis tool. Python lets you do some crazy things (like naming variables the same as important built-in classes).
posted by spiderskull at 10:45 AM on March 17, 2013 [1 favorite]


My former colleague, the network admin at a small outfit, got a call from an administrative assistant who swore that her mouse stopped working around 1:15 every afternoon. "I know, it doesn't make any sense, but I swear that's what happens."

Rather than dismiss her as computer illiterate or foolish, he went up there at 1:10pm the next day and waited. Turns out the sun came over the edge of the building right then and (at that time of year) beamed right onto her mouse pad, making it hot and smushy. Moving it out of the sun solved the problem.
posted by msalt at 10:48 AM on March 17, 2013 [3 favorites]


Here's a rule for you - "It's not the compiler".

Oh, but then there are the times when following this rule causes you to spend forever going over your own code again and again, because after all, the people who write real production compilers are really smart, right? Much smarter than me! And then at the point where are losing your shit and questioning your grasp of reality, you try something simple to test the compiler functionality and bam, there it is.

So here is my counter-rule: Trust no one. Probably, it's not the compiler, or the standard library, or the database, or the web server. But it's better to determine that than to assume it, because sometimes it is.
posted by enn at 10:51 AM on March 17, 2013 [3 favorites]


As someone who can't write a line of Java, but can successfully debug Java problems the Java programmers can't fix just by asking them the right questions, I totally get this.
posted by davejay at 10:54 AM on March 17, 2013 [3 favorites]


Here's a rule for you - "It's not the compiler". In other words, if you're tempted to believe it's something that the compiler is doing wrong, then you are wrong.

This is almost always true. However, in those rare cases where it is a compiler bug, it's utterly maddening. Back in 2005, GCC had a linker bug for the embedded platform I was using. I spent three weeks manually reconstructing the Linux kernel stack (literally by hand -- writing down PCs and register values, stepping the CPU), and finally traced it back to a bad memory map. It was... frustrating.

Recently, I've been using Intel's C++ compiler. It's incredibly good at automatically vectorizing math code, but it also has some outstanding bugs. Thankfully their support has been fairly responsive, and they release fixes regularly.
posted by spiderskull at 11:00 AM on March 17, 2013 [1 favorite]


The most useful technique I have learned to use when faced with an important bug that I don't understand is to tell myself that I am about to learn something *fascinating*, because the bug defies all my expectations and assumptions. The bug tells me that my worldview is incomplete, and it is going to change, I just don't know how yet. I then go from being annoyed to being open to enlightenment, which helps my attitude a lot.

The most interesting bug I've ever dealt with had to do with a string processing function. I had to port some code from C++ to Java, but discovered that one of the crucial parts, a hash function that mapped a string to a 64-bit integer, already existed in Java land. (Someone had ported it a couple years previously.) I cheerily called it from my code and all the unit tests that I could come up with passed.

A couple months later, we got bug reports from some Japanese clients. The code wasn't working for them. After some investigation I found the problem was in the ported hash function. It used the >>> operator when combining different bytes of the string to yield the 64 bit integer. The C++ version was using unsigned longs with the >>> operator, so the leftmost bit was getting filled in with a 0 whenever it was called. The Java version was using signed longs, so the leftmost bit was getting filled in with whatever the leftmost bit already was.

My unit tests passed fine because I was using English text, which in Unicode has a 0 at the highest order bit of each byte, so the results of the Java code matched those of the C++ code. But the encoding for Japanese has 1s in the high order bits of most bytes, so it exposed the difference between the two implementations.
posted by A dead Quaker at 11:19 AM on March 17, 2013 [22 favorites]


The best X happens at Y time story I've ever heard was second hand and about computers at university which, at a certain time each day, would crash after the monitors diddplays distorted in a weird way. After investigation it was found that on the other side of the wall was a physics lab where they'd have a daily test of a huge electromagnet designed to contain partical physics experiments.
posted by Artw at 11:21 AM on March 17, 2013 [1 favorite]


... one of the few said "watch this bit, it's tricky", guess where the bug was?

I heard a similar story about a bug caused by an undocumented instruction that broke when the processor was upgraded. The comment was, "Here's the clever part."
posted by Bruce H. at 11:26 AM on March 17, 2013 [2 favorites]


As someone who can't write a line of Java

;

That's a line of Java. Now you can!
posted by axiom at 11:28 AM on March 17, 2013 [11 favorites]


The real problem with debug statements is that live ones alter program timing and usually change the behaviour of race conditions.

True, but this is like saying that the problem with convertibles is that there's nothing over your head when a bomb lands on your car. Whether or not it produces on debug/release is just the beginning of the nightmare.
posted by fleacircus at 11:35 AM on March 17, 2013 [1 favorite]


My best physical debugging:

Instead of opening his email in split-screen view, his email application was opening it in full-view, but only sometimes. It didn't really make sense for it to be a software bug if it was happening only once in a while.

Turns out his mouse was broken; it was sending double clicks once in a while, which had gone entirely unnoticed otherwise. I only realized it was the problem because I knew mice could do that, after having spent some time stuck with a broken one.
posted by BungaDunga at 12:09 PM on March 17, 2013


I can't stand it when colleagues commit code with commented-out situational logging statements in for 'debugging' purposes

Oh yeah, don't check in stupid debug statements. The kind of thing I do is check in useful logging, in the context of a fancy logging system like log4j and its children that's runtime configurable. For instance I find it very useful to have deep in the database connection code a statement that logs every single SQL statement that's sent to the database. 99% of the time you don't want to see that, it's noise. But you leave the code in to log those statements and just disable it in the runtime config. Then when you need to see the SQL it's just a matter of turning on those log messages.

Good logging is part of a larger theme that fascinates me; designing code to be understandable. I'm also a fan of refactoring code to make it easier to unit test, even if it makes the code a little longer or slower. Usually the code also ends up being more robust, so in the end the tradeoff is worth it.
posted by Nelson at 12:18 PM on March 17, 2013 [2 favorites]


When debugging, one is tempted to say, Oh, it must be this, change it, and see if it fixes the bug. When it doesn't, repeat. And, actually, this works a lot of the time, because many bugs are pretty simple and obvious.

But if I find myself going more than three rounds, I try to remember to pull out the big guns: The Scientific Method, pretty much like we learned it in high school: make a hypothesis, ask what would disprove or lend credence to that hypothesis, test the hypothesis. And publish — that is, write it down (preferably somewhere in a public space where others can read it).

Tom Preston-Warner gave a talk on this a long time ago; we were working together at the time, and this has stood me in good stead.
posted by willF at 12:21 PM on March 17, 2013 [3 favorites]


sonic meat machine: Ultimately, after hours of work on this, we figured out that at some point in the distant past, someone wrote this Python code:

...


Arliasfhsglasid I have run headlong into that same problem (luckily in my case it was just a simple script, with no threading involved) when accidentally omitting the parentheses in the except statement. Maybe it's my fault for not studiously reading through the language reference, but it makes me wish that Python forced the use of explicit tuples in every destructuring assignment because that shit is hella unintuitive.
posted by invitapriore at 12:25 PM on March 17, 2013


There's some interesting anecdotes here. I work in a different world where many of the bugs are on the processor itself and a few of the most annoying are in the compilerish tools. But I tell debuggers to be very distrustful of everything.
posted by jclarkin at 12:25 PM on March 17, 2013


Arliasfhsglasid I have run headlong into that same problem (luckily in my case it was just a simple script, with no threading involved) when accidentally omitting the parentheses in the except statement. Maybe it's my fault for not studiously reading through the language reference, but it makes me wish that Python forced the use of explicit tuples in every destructuring assignment because that shit is hella unintuitive.

The solution, in our case, was to mandate the use of a more verbose syntax:

except ExceptionType as exc:

except (ExceptionType, ExceptionType) as exc:

This prevents the problem, and allows for a standard nomenclature for logging or re-raising the exception within the except block.
posted by sonic meat machine at 12:42 PM on March 17, 2013


msalt: Turns out the sun came over the edge of the building right then and (at that time of year) beamed right onto her mouse pad, making it hot and smushy.

Are you sure that was the cause? In all the similar cases I've seen, the real cause was that the sunlight was overwhelming the mouse's IR sensors.

(Moral of the story: debugging is hard, but root cause analysis may be even worse.)
posted by xil at 12:55 PM on March 17, 2013 [2 favorites]


Here's a rule for you - "It's not the compiler". In other words, if you're tempted to believe it's something that the compiler is doing wrong, then you are wrong.

When I worked on embedded systems, the C compilers we used were all very expensive, low-volume boutique compilers targeted at a specific, sometimes obscure micro-architecture. They had bugs disturbingly frequently. Obvious, glaring, inarguable bugs (once you thought to look for them) like invalid opcodes in the output. "Compiler issue" should never be your first guess, but if you assume the compiler is perfect, that just becomes a blind spot, a weakness for you.

I have become the filter through which my office's feedback passes before being sent to the programmers. Holy hell, is that a thankless job.

I was never the guy in the middle in a situation like this, but I started writing code professionally in an office where the previous programmers and the tech-support/testing guys (it was a small place) had long had an acrimonious relationship.

I had some success redressing the nastiness in the air in the simplest possible way: I supplied the testing people with baked goods (cookies, brownies, doughnuts) whenever they found a bug they could reproduce consistently. Then I'd fix the bug.

Fixed a lot of bugs in a short time this way. And the testing guys and I got along great.
posted by Western Infidels at 1:10 PM on March 17, 2013 [3 favorites]


The bit in the article about discovering the secret behind magic tricks reminds me of a Feynman anecdote (first one here). Come to think of it, Feynman would have been great at debugging, just like everything else. Sigh.

In the story I linked it says 'Randi literally fell backwards over his chair'. Well he better have done, or I'm going to explode with frustration. literally.
posted by Ned G at 1:35 PM on March 17, 2013


Heh! When I said, "It's not the compiler," I immediately gave two examples of it "being the compiler" - in one case, it's even worse, the microcode! - so I definitely know it can happen.

But for every twenty times that someone has claimed to me "It's a compiler bug" or "It's a language bug", perhaps one of them was real.

Point is that while you debug you must tell yourself it's not the compiler and try to extract a smallest possible program to demonstrate the issue. If that proves it's a compiler problem, then you have something to actually report.

Perhaps a better rule is, "A compiler (or language) bug is the last place you should look." I certainly don't want to give the impression that it can't happen.....
posted by lupus_yonderboy at 1:43 PM on March 17, 2013 [1 favorite]


After you've been a senior developer in a team for a while, you acquire the magical ability to fix bugs just by getting out of your chair.

You see, when a junior dev has a problem, they'll call you over, and you run through the obvious things first. Are you really looking at what you think you're looking at (or is it the wrong server etc.)? Are you really editing what you think you're editing? When did it last work? Have you tried isolating the problem? And so on.

Eventually, when they hit a problem, they'll go through most of these things themselves, but sometimes miss something. So they call you over to help... then find themselves preemptively going through the list you'll run through when you arrive.

Rather than being annoying, it's incredibly satisfying to only make it halfway across the room before an imaginary version of yourself provides the solution. You know it won't be long before they don't need the real you at all.
posted by malevolent at 1:48 PM on March 17, 2013 [24 favorites]


Tangential but involves debugging, of a sort: The Story of Mel, a Real Programmer.
posted by bz at 2:04 PM on March 17, 2013 [3 favorites]


An interesting subclass of bugs are the "Heisenbugs", bugs that disappear when you attempt to debug them.

Strangely enough, I actually had one in Ruby last week (my stack overflow question, but the fun is in this gist).

Essentially, when using a particular method with a particular variable type, the internal state is not set up correctly. However, if you were to use the variable prior to using that particular method, everything would work.

This didn't work

weird_method(my_var) #my_var should be fully instantiated before calling the method, but isn't for some reason

But this did

my_var #--- doesn't do anything other than to instantiate the variable
weird_method(my_var)


As any ruby programmer would tell you, there should be no difference in behavior between the two versions (except there was in this particular case)

Furthermore any type of trace or debug inspection of my_var would instantiate it, and the problematic method would work as expected again.
posted by forforf at 2:07 PM on March 17, 2013 [2 favorites]


My co-worker once found this amazing bug where the program would segfault if and only if seven objects of a certain type were allocated. Not six, not eight, not odd numbers in general, not prime numbers in general, not further multiples of seven, etc - just exactly seven.
posted by en forme de poire at 2:26 PM on March 17, 2013 [3 favorites]


I kinda love bugs. I fixed software bugs professionally or a few years, as 'maintenance' work, or special releases in legacy code, or just working on very buggy products where I was at the wrong place in the chain. Long comment because I'm an old person in a rocking chair.

Debuggers are awesome. At one job, I worked on an application that, all told, had 100 threads. Fortunately, "only" 20-30 of them might be in play when doing a single given job, and most of those were joined in a pipeline feeding the output of one to another. My module was 3 threads, but it was positioned in between two major components in such a way that bugs often came to me because the combined owners of the 2 ur-modules (in their groups in different parts of the building) would do whatever they could to push them towards the other major ur-module, and 'my' little module was the link between them. So I spent a lot of time dealing with bugs. Often they'd just give them to me fresh so I could push them to the right ur-module and then sometimes have the fun of tracking the bug down inside their code because they were a bunch of assholes quite busy.

So I'd get a bug out of nowhere, something that was almost surely not in 'my' code, and let me tell you, when you are dealing with millions of lines of code and 100 threads, in an older application whose original authors no longer work for the company (or are inscrutable Bulgarians), code whose very comments are rendered false by the march of time—you don't just put in some printf's. You try to get a breakpoint and look at some stack traces hoping to just figure out what in the hell is even going on; you step through deep layers of other people's code that you've never seen in your life before today, seeing them dealing with their own private subsystem hell you didn't know existed. Sometimes the debugger is not helpful once you're trying to catch the actual problem, and you try to build object files with progressively more logging statements enabled and swap them into the machine. But even then the debugger can help you get in the right neighborhood of the bug at least. You'd rather use the debugger; you don't always need one, but to look down one's nose at them, well, it must be nice to have that luxury.

Anyway, the end product being a printer controller, the 'release environment' is inside a copy machine the size of a meat freezer in QA in the next building over that's already busy testing a different version of the software than the one in your bug report (another joy: building the right version and discovering old build processes; bonus joy: old source control processes), so you do everything you can to nail down reproduction in as controlled environment as possible, just so you don't tie up that one machine or go home every day cursing the whole idea of software.

Sometimes you have to go to the actual printer, though. Favorite bug incidents:

1) A bug where the ~300th page of a 500 page stress test document would print in B/W instead of color. This had to do with a job override being ignored then reset when an upstream module had finished, forgetting that downstream the job was still being used (though it was a little weirder than that even). Not knowing any of that yet, and not fucking believing it, I went to see it get reproduced. So the QA guy and I stood beside the machine waiting for page 300 to come out. It was a ~80ppm machine, so the pages were popping out noticeably faster than one per second, chug-chug-chug-chug. The QA tech was chewing gum as we both watched the pages being spat out the side of the machine onto a box on the floor. (Finishing unit not attached). As 300 approaches, QA tech takes the gum out of his mouth and, as the B/W page pops out, he darts his hand out and pulls it back with the page stuck to his finger by the gum.

2) "Job makes printer freeze up sometimes". It was a graphics-intensive document, so it took a long time to process, longer than the client had patience for, and being resource-sucking it took awhile to cancel when the client went to cancel it, and of course those are where everyone thought the problems were: processing and cancel. (My task was implementing better job cancelation at the time.) We had no idea what the client meant about "sometimes", but what do clients know? We tried to make things faster but on investigation it looked like it was going as fast as it could and it wasn't that bad, so it seemed like a dead end. As it turned out, the real problem was the client was (sometimes) using a heavier weight of paper that couldn't go through the duplexer of this particular printer. The hardware could detect this so instead of doing its usual trick of printing forward-order but flipping the pages over so the last-printed page winds up on top of the output stack, face-up, it had to do it the hard way: process the entire document, storing pages to disk, then feeding them to the printer in reverse order, which spanks the controller pretty hard. Hence the printer would seem to "freeze up sometimes". The proper behavior was for the printer to say "fuck your reverse bullshit" and print the document forward order anyway when the heavy paper+reverse job was detected, which was a one-line change.

Bugs are the best puzzles.
posted by fleacircus at 3:08 PM on March 17, 2013 [4 favorites]


I have been struck for a good long while how similar medical interviews are to these technical forms of problem-solving. Particularly, as one thread inhabitant put it, "It takes 95% of the effort to get the user to specify the problem in a way that is useful for debugging." Much of the effort of seeing a patient with a new complaint is getting them to describe the complaint in a useful way.

Alternately, one can say that physicians interact with patients and guide the interview such that patients are led into adopting a vocabulary the physician can work with. "Dizzy" must become something like vertiginous, presyncopal, giddy; "numbness" must become hypesthesia, paresthesia, paresis, allodynia, hypersthesia.

Just as amateur techies can muddle things by making assumptions about problems that are misinformed, mistaken, or untrue, patients can buy themselves a lot of trouble and tests by using medical terms of art. Unless the physician is careful.
posted by adoarns at 3:23 PM on March 17, 2013 [4 favorites]


msalt: Turns out the sun came over the edge of the building right then and (at that time of year) beamed right onto her mouse pad, making it hot and smushy.

xil: Are you sure that was the cause? In all the similar cases I've seen, the real cause was that the sunlight was overwhelming the mouse's IR sensors.

Pretty sure. It was an old rolling-ball type mouse.
posted by msalt at 3:47 PM on March 17, 2013 [5 favorites]


I worry a bit about the "in the good old days" loyalty to print statements and their steroidal cousins, log frameworks.

I recently browsed to this index of optimizations in Java's HotSpot JVM. I have a computer science degree and have been professionally programming Java for about ten years, and I'd never heard of most of those things. For many of them, I can only make a rough guess about what they mean. Several of them are outright gibberish. And that's just runtime optimizations.

So what worries me is debugging code like this: (intentionally stupid for realism)
  public final int calcAge(Date[] prevBirthdays) {
    int ageSoFar = 0;
    for (Date curBday : prevBirthdays) {
      log.debug("Iterating over birthday: {}", curBday);
      ageSoFar++;
    }
    return ageSoFar;
  }
With varying levels of smart optimization, and depending on the logging framework and configuration, we can get some interesting effects here. If the optimizer can prove that "log.debug(...)" is a noop, it could easily replace that method body with:
  public final int calcAge(Date[] prevBirthdays) {
    return prevBirthdays.length;
  }
...which in turn offers lots of cascading opportunities for inlining or other optimizations. If you're trying to debug a race condition, it can make all the difference in the world between your dev environment and your production environment, not to mention being something that can be invisibly affected by library or JVM upgrades.

And that's just the debugging implications... if the logging can't be optimized away (as, for example, if the logging statement used log4j-style string concatenation rather than slf4j-style argument templating), you'll end up with a needlessly slow program even when debug-level logging is disabled. Your program to calculate the average U.S. age of retirement becomes at least 67 times slower, probably far worse.

Of course, running it through a debugger may have the same problem, since it'd be forced to insert the JVM "breakpoint" instructions and make the method unoptimizeable. But since developers tend to run the program at least once without the debugger before committing it (or at least I hope they do), they'd have a greater chance of noticing than if it were based solely on "debug=true" versus "debug=false" in the different environments' configurations.

This undermines Linus's point (linked above) about wanting to make it a challenge so that people are more careful. This is something that seems safe to 95% of the good programmers. Ironically, a developer who didn't even know how to do logging or printing would be more likely to be able to reproduce the problem. Care and thoughtfulness are almost inversely correlated to one's ability to identify problems like this.

It also undermines the First Rule of Dynamic Optimization, which is "you do not design code for dynamic optimization" (since clever things are often hard to optimize). That's usually true, but when you have dependencies on things that can't be dynamically optimized – like log file I/O – you end up inconsistently burdening the machine with our feeble human limits.

This is not a concern I would share with most developers, because it's interesting, and interesting problems make programmers do stupid things. I'm pretty sure that writing code to avoid the above problems would make for the first case of premature premature optimization. Maybe we need a notation P for prematurely-determined complexity... this being an example of P(O(n)²).

I'd welcome some commentary from smarter and more experienced folks than myself.
posted by Riki tiki at 4:28 PM on March 17, 2013 [2 favorites]


I love stuff like this. I once went for lunch with a guy on our sales team. The parking meters were digital and the only option was 1 hour for 25 cents. We dropped in a quarter and the display, which had shown ":00", continued to show ":00". The salesguy said "It's broken, it didn't give us our time.", to which I replied "Or, it gave us our time but the hours digit it broken." He stared hard at me for a few seconds and then said "Damn, you really are an engineer, aren't you?". We then went to lunch without waiting the 48 seconds required to disambiguate.
posted by benito.strauss at 4:39 PM on March 17, 2013 [10 favorites]


Speak of the devil, I was reading this thread when a window popped up, saying "Java Update Available!" LOL.
posted by charlie don't surf at 4:41 PM on March 17, 2013


I love my debugger too. Unfortunately "the cloud" makes them slightly less useful and relying on it may give you a false sense of security. Most of the time when a bug is reported, I'll ask someone to debug through it locally, invariably they can't reproduce it and declare the bug doesn't exist.

That is when i have to start digging through the logs. A while I wrote a drop in replacement to move logging from .log files to AMPQ. Any of the dozens of instances of the code running sends log lines to a fannout exchange. Consumers can do whatever they like. We have one that writes them to .log files, but also several that do more specific tasks. I almost always find that the bug reported actually does exist after looking at the logs.

The hard part is getting other developers to add sufficient logging. Unfortunately, I can't say "no logging in that code, we are SOL", I usually have to add logging and deploy a hotfix. Then yell at whatever parties were responsible for the code and code review.
posted by Ad hominem at 5:06 PM on March 17, 2013 [1 favorite]


It is also useful from a CYA standpoint to log the begin and end of methods or other important events. I don't know how many times I've gotten a ticket for "slowness" and been able to say " I'm looking at the logs and your X was processed in 200 milliseconds." It is important to give the time in milliseconds, and it is also important to give them an out. Let them say "maybe it is the network",then you can agree with them. If you say "it isn't slow" you will be on the phone for hours while they try to prove you wrong.
posted by Ad hominem at 5:42 PM on March 17, 2013


- "It's not the compiler". In other words, if you're tempted to believe it's something that the compiler is doing wrong, then you are wrong.
Yeah. You'd think so. With mature compilers, there are definitely fewer bugs, but I still hit them. One of my favorite errors that I found recently found was this:
try { (*fncPtr)(someArg); } catch (SomeException &e) { }
The problem was that the fncPtr was in fact thrown that exception and it was catch in debug but not in release.

Nice.

I've been coding since 1980 and while I rarely use debug output (I do - it has its place), I'm the one who will debug just by reading. I'll look over junior engineer's shoulders and just read and start cutting the code apart using divide and conquer to figure out what's going on. This stems from working on a system where the build time for a debug run was close to 5 minutes. From a debugging point of view, the best you can get is 12 runs per hour, so you want to maximize the information you get per run. Reading will do that.

I believe that in the afterlife, software engineers will have their own particular Valhalla wherein they regale tales of programming prowess and victory with limited hardware.

Many years ago, while working on Acrobat 1.0, we had a bug that didn't reproduce easily. We had a QA person working full time just on trying to get this bug to reproduce consistently. We had a working theory that the bug was caused by an uninititalized local variable (which was taking its value from whatever had been left on the stack in previous call chains. Since none of the compilers of this vintage would flag even a warning for using a variable before it was initialized, it was daunting to track it down.

So what I did was looked at the library for doing code profiling in the compiler. "Profiling?" you ask, "why on earth would you look at profiling?" Simple - the profiler injects a function call into every start and end of every function, so I wrote a custom profiler that when it was called, grabbed the return address, looked back in the code to find the code that set up the previous stack frame and initialized all the memory for local variables to values that if they were pointers would cause an immediate halt and if were indexes would be way out of range of typical things. This code located the bug in question and three others that were waiting to happen.
posted by plinth at 5:56 PM on March 17, 2013


This stems from working on a system where the build time for a debug run was close to 5 minutes.

Yep. 3 year at a company with 15 minute build times is a great education. Writing code that will be built and deployed on 10 different UNIXy OSes (remember DEC Alpha?) purifies the mind too.
posted by benito.strauss at 6:00 PM on March 17, 2013 [1 favorite]


Rubber ducking
posted by jcruelty at 7:46 PM on March 17, 2013 [2 favorites]


I'm a strong believer in knowing how to use debuggers and print/log statements. A lot of code I've written is event-based, which can be nasty for debuggers. "Stop when you get a mouse event...No! Not the mouse events just from moving the mouse toward the button I care about!" (A simplified example but hopefully you get the point.) So you log stuff and look back to see what happened.

OTOH, debuggers can be great, especially if you don't even know what you would want to log.

I've also found Jython incredibly useful as a debugger for Java programs.

Since optimization was mentioned, I'll get out my soapbox (or one of them, at any rate). The main thing you need to know about optimization is:
"More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason — including blind stupidity." — W.A. Wulf
Operationally, for me this means on every team I'm on I try to convince the other developers to follow these rules:

1. Don't optimize.
2. [Experts only] Don't optimize yet.

Unless the code is really small, if you think you know why it is or will be too slow, if you do not have profiler output or some other evidence why that's the case, I don't believe you. Nothing against you personally, it's just that you are a person, and people are really bad at guessing why programs are slow. ("you" == impersonal you. Not aimed at anyone in this thread. (Unless you happen to work with me. Then I do mean you.))

Anecdote: I helped out with optimizing a GUI widget toolkit a long time ago. Someone had run a profiler on it and found it was spending 40% of its time in a single function: comparing strings (strcmp). It used properties for everything, and they had string names. So it did lots of compares just to find out, for example, what color the background of a button should be. Once we changed it to intern the strings, it was much snappier.
posted by at home in my head at 9:21 PM on March 17, 2013


My favourite Magic Debug goes back to the days of BBSs and IBM PC keyboards. I was talking to the Sysop (voice phone thingy) while she was typing away, and she suddenly exclaimed all her text had gone ipper-case. Numbers were fine, just alpha. Shift-lock, shift keys etc were tested and poked, but nothing fixed it, except rebooting which she didn't want to do because the BBS was online.

I told her to hold hit the left and right shift key simultaneously - it took a couple of tries, but the capslock was ultimately unlocked. She was astounded, needless to say.

I made the suggestion because a similar ting had happened to me and I fixed it but just bashing at the kb.

Afterwards I did some research and found that the ROM routines for the old IBM keyboard had a timing problem, such that if the keys were pressed too fast, the shift lock would activate without toggling the shift switch status. The address for the left and right keys were different (even though they performed exactly the same function) and by hitting them simultaneously, the effect would be reversed.
posted by arzakh at 9:24 PM on March 17, 2013 [1 favorite]


I was waiting for rubber ducking to show up. I work from home these days and I quite like it, but the one thing I miss is the ability to run a problem by a coworker, because this actually works.

At a job ages ago I was stuck on a design problem for two days on a critical piece of a product. Our "CTO" was the stereotypical buzzword-spouting, Fast Company-reading positive-energy-exuding annoyance, a nice enough guy but sort of the wrong person to go to with technical problems. He sits down by me and asks he if can help. "No, I don't think so but thanks," I say. "No really, try to explain it to me like I have no idea what I'm talking about." Choking off the obvious retort, I start to formulate an explanation. Before I even get a word out, I see it. He gives me a goofy grin and wanders off.

The next day the CEO drops by my desk. "So I hear X solved your design issue." It was then that I nuked the planet.
posted by vanar sena at 9:26 PM on March 17, 2013 [12 favorites]


One more optimization comment. If you think you understand C, ponder this code:
send(to, from, count)
register short *to, *from;
register count;
{
        register n = (count + 7) / 8;
        switch(count % 8) {
        case 0: do {    *to = *from++;
        case 7:         *to = *from++;
        case 6:         *to = *from++;
        case 5:         *to = *from++;
        case 4:         *to = *from++;
        case 3:         *to = *from++;
        case 2:         *to = *from++;
        case 1:         *to = *from++;
                } while(--n > 0);
        }
}
If, like me, you followed a thought path like "Huh? Does this even compile? But what does that part mean? What does it do? Huh?", read the explanation of Duff's Device.
posted by at home in my head at 9:30 PM on March 17, 2013 [2 favorites]


Back in my university days, I had a program that would run just fine if the first line was a print statement (why yes, I did use a lot print statements to ensure my program was working and then comment them out later, why do you ask?) but segmentation fault if the first line was not a print statement. It wasn't printing a variable, just a hardcoded string, and eventually, it printed nothing at all. But I had to leave it in the final code that I submitted, because without it, the whole program crapped out.

I never figured that one out, and neither the TA for the class nor the professor could figure it out, either.

I've always wondered...
posted by jacquilynne at 7:10 AM on March 18, 2013


However, in those rare cases where it is a compiler bug, it's utterly maddening

If you think it's not you, you need to *prove* it's not you. Get rid of everything not related and try again.

If you think it's a network problem, get out of your code, grab a packet analyzer and look. If the network is fine, then you move up. If it's not, no amount of work *in* your code can fix it.

If you think it's a complier bug, isolate it and post it. Warning -- you'll probably just be told it's you, and have that proved. But then you known the answer.

Troubleshooting as a class involves isolation -- if the problem could be your computer, your network, or your server, isolate that down.

But, the entire process can be boiled down to "I will prove what happens when X is like it is now."

Finally, you don't have the answer until you fix it. "It just went away" means it may well come back. And you can't be sure you've fixed it until you can break it again in exactly the same way. When you can say "I call this with X, and get Y, and call it with Y, and still get Y, when I should get Z", and then say "And it does it on every computer running that version of GCC" and "But not on computers running these versions of GCC" --then, and only then, can you be sure there's a compiler bug.

The question remains, though -- which one is the buggy version? ;-)
posted by eriko at 7:12 AM on March 18, 2013


I confess, I've made a career out of floating in the bathtub.
posted by infini at 7:30 AM on March 18, 2013


at home in my head: "If, like me, you followed a thought path like "Huh? Does this even compile? But what does that part mean? What does it do? Huh?", read the explanation of Duff's Device."

There's a better explanation of Wikipedia's example of Duff's Device over at Stack Overflow.

I knew that C's fallthrough mechanism was liberal, but that's just a stunningly clever bit of programming.

(The part where he jumps into the middle of a loop is also a stroke of evil genius that took me forever to figure out, because the "do{" statement never actually gets processed)

Thanks for that bit of interesting reading to start my week!
posted by schmod at 7:48 AM on March 18, 2013


Just about ready to graduate with my BS in Computer Science, I had a job interview. The interviewer thought he was quite clever by weeding out us n00bs by asking how we usually debugged our code. Of course, all of the programming I had done in school was in C++ on UNIX and Java on Sun Solaris using pico and emacs solely as our code editors. The moment I said I used cout() statements, I was cut from his list of potential hires. "I'm looking for people who use debugging software with code breaks."

I'm actually glad he didn't hire me because it would have sucked to work for someone with such arbitrary definitions of a "good" programmer versus "bad" programmer. I think using descriptive variable and function names and adding comments are five million times more important than how I figure out where I forgot a semicolon.
posted by jillithd at 8:02 AM on March 18, 2013 [1 favorite]


As someone who has been learning to program off and on for a while, I've noticed that many of the code contained in the books I've been using is just flat-out wrong, and I get a certain perverse pleasure out of trying to figure out how to fix it.

I've often wondered whether the author intentionally put the mistakes in there, because otherwise it means that the guy I'm learning from wasn't able to generate working code for a Hello World program for the book he's writing...
posted by Huck500 at 10:35 AM on March 18, 2013 [1 favorite]


Ah man, Duff's Device. That's yet another case with C where it's actually less confusing to think about what's going in terms of the object code that your code would produce, or at least it is for me, because I read the case statements as basically "stick a label here that you can jump to" and then the mechanics of the fall-through are pretty apparent. The other case I'm thinking of right now where thinking about the object code makes things clearer is synchronizing shared variables in a threaded context: it's so easy as a novice programmer to think that an operation like shared++; is atomic, but when you realize that that's going to translate into three instructions, and that you very well might get a context switch before the one thread gets to store the updated value, it just becomes so much clearer than if the C code is as granular as your understanding gets.
posted by invitapriore at 10:38 AM on March 18, 2013


schmod: "I knew that C's fallthrough mechanism was liberal, but that's just a stunningly clever bit of programming. "

Yeah, but what makes it truly good is that it's easily understandable if you disregard any questions about whether or not it will compile. Read it as written and it makes perfect sense. There are a lot of optimizations that result in really impenetrable code.
posted by wierdo at 12:44 PM on March 18, 2013


Heh. Raise your hand if you needed to use Duff's device in your own code in order to meet critical timing.

In my case it was for a low-end laser printer that had two serial ports and a parallel port. While it was printing, if the com ports were getting hit heavily, the CPU was spending enough time handling the interrupts that the FIFO used to drive the laser starved, leaving blank patches on the page. The performance was right on the hairy edge that making the FIFO fill code use Duff's device made the problem vanish. This is the printer in question. I recognize the start page on that - I lovingly hand-made it in PostScript for DEC. Although you can't see it in the picture there, there are thumbnails of "typical" print documents. In my first draft, they were a page with the opening text from Winnie Ille Pooh, a chart, and a picture of my (now ex-) wife. At the 11th hour, DEC required that they be removed and replaced with some text from a DEC service manual and a different, non-gendered picture. Mind you, the text on those mini pages was rendered in something like 4 point and the picture of my wife was smaller than a postage stamp. No, they said, someone might construe it as sexist. So I removed it. They didn't tell me that I couldn't put it back in full page as an Easter Egg. Take the printer off line, hold down the test button and while keeping it down press Menu-Menu-Menu-Enter-Enter-Enter.

Interestingly enough, as of 2003 or so the MS C compiler targeting x86 will remove Duff's device when it finds it, since it is not optimal with a decent cache.
posted by plinth at 1:08 PM on March 18, 2013 [5 favorites]


Interestingly enough, as of 2003 or so the MS C compiler targeting x86 will remove Duff's device when it finds it, since it is not optimal with a decent cache.

I believe you, but why is that the case? From where I'm sitting that routine has excellent locality, so even in the case of a tiny cache you're still only going to get a cache miss roughly every ceil(count / {cache line size}) copy iterations, right? Or am I missing something?
posted by invitapriore at 2:08 PM on March 18, 2013


deadmessenger: "After over 20 years in IT, I've come to the conclusion that people who can quickly arrive at non-obvious solutions to strange problems like this are rare, but that it is a skill that can be taught to those who weren't born with it. I've actually got a book on the subject outlined - one of these days I'll actually get around to shopping the proposal and writing it."

Drawing on personal experience, I'd say it helps if you have ADHD.
posted by krinklyfig at 11:46 PM on March 18, 2013 [1 favorite]


jillithd: "I'm actually glad he didn't hire me because it would have sucked to work for someone with such arbitrary definitions of a "good" programmer versus "bad" programmer. I think using descriptive variable and function names and adding comments are five million times more important than how I figure out where I forgot a semicolon."

Have you actually found a job? My experience has been that there's always at least one CS uber-geek who shoots down any candidate that doesn't meet his purity standards.

A while ago, I interviewed for a JavaScript developer position, and the guy wouldn't stop grilling me about tail-call recursion (which isn't even implemented in most JS interpreters).
posted by schmod at 6:51 AM on March 19, 2013


wierdo: "Yeah, but what makes it truly good is that it's easily understandable if you disregard any questions about whether or not it will compile. Read it as written and it makes perfect sense."

The jump to the middle of the loop is that throws me. As I read the code, I never see the "do{" statement get executed, because it's within a case statement that evaluates to false.

I'm not a C guy, but I still don't understand how or why this is valid.
posted by schmod at 6:55 AM on March 19, 2013


Have you actually found a job?

Oh yeah. That was *cough*10*cough* years ago. I have a job writing C code on Linux using VIM as our main code editor. Still no debugging software! LOL!
posted by jillithd at 7:00 AM on March 19, 2013


Heh. Yeah, there's whole subsets of programming I brush up for only for JavaScript interviews because they never come up outside of them. Good JS questions would probably be about scope, functions (particularly as regards namespacing, modules and callbacks) and prototypes, with the acknowledgement that the syntax for the later is wacky and often gets buried under wrappers. If I get questions on that sort of thing I feel like I am talking to someone who knows the language and not just some CS jerk.
posted by Artw at 7:03 AM on March 19, 2013


One of the easiest ways to get a software programming bug solved is to begin typing up a clear explanation of the situation to a mailing list or online forum. Wait 30 minutes to submit your question and re-read the thing about three times. Magically the dumb part becomes clear.

I can't begin to say how many times this has happened to me. Just articulating the problem does wonders and I would say about half the stack overflow topics posted could be eliminated with this this discipline.
posted by dgran at 11:17 AM on March 19, 2013 [2 favorites]


case 0 is jumped to whenever count is a multiple of 8, but it doesn't really matter. We can imitate how this code gets compiled pretty well in C using goto statements (and assuming GCC's "labels as values" feature):
send(to, from, count)
register short *to, *from;
register count;
{
     /*
      * GCC lets you take the address of code locations
      * using the `&&` operator, which the goto statement
      * can then accept as an argument
      */
     static void *jump_table[8] = {
          &&loc_i, &&loc_0, &&loc_1,
          &&loc_2, &&loc_3, &&loc_4,
          &&loc_5, &&loc_6, &&loc_7
     };
     register n = (count + 7) / 8;
     goto *jump_table[count % 8];
loc_0:
loc_i:
     *to = *from++;
loc_1:
     *to = *from++;
loc_2:
     *to = *from++;
loc_3:
     *to = *from++;
loc_4:
     *to = *from++;
loc_5:
     *to = *from++;
loc_6:
     *to = *from++;
loc_7:
     *to = *from++;
     if (--n > 0) {
          goto loc_i;
     }
}
That was pretty off the cuff, and I don't think it will compile owing to the pre-standard function declaration being mixed with later features, but conceptually I think it replicates the functionality of Duff's Device exactly, and I'm fairly certain the object code would be structured almost exactly that way. Is that clearer?
posted by invitapriore at 12:15 PM on March 19, 2013 [1 favorite]


Corrections: the index to the jump table should read (count % 8) + 1, and I can only vouch for the object code looking like that with low or no compiler optimization.
posted by invitapriore at 12:31 PM on March 19, 2013


infini: "I confess, I've made a career out of floating in the bathtub."

Idea: A rubber duck containing a voice chip. When the duck is squeezed, it provides such timeless advice as "Think Laterally", "Consider Variations", "Search Outside the Box" and "What Would Steve Jobs Do?" Ideally it should sound not unlike Mel Blanc.
posted by vanar sena at 2:21 AM on March 24, 2013 [1 favorite]


« Older “Happy shopping! Enjoy the 1st.”   |   Untied & weightless Newer »


This thread has been archived and is closed to new comments