It's sorta wrong.
July 9, 2017 8:47 PM   Subscribe

Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken. "I remember vividly Jon Bentley's first Algorithms lecture at CMU, where he asked all of us incoming Ph.D. students to write a binary search, and then dissected one of our implementations in front of the class. Of course it was broken, as were most of our implementations. This made a real impression on me, as did the treatment of this material in his wonderful [book] Programming Pearls. Fast forward to 2006. I was shocked to learn that the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of Programming Pearls contains a bug. Once I tell you what it is, you will understand why it escaped detection for two decades..."

"In Programming Pearls, Bentley says "While the first binary search was published in 1946, the first binary search that works correctly for all values of n did not appear until 1962." The truth is, very few correct versions have ever been published, at least in mainstream programming languages."
posted by storybored (29 comments total) 31 users marked this as a favorite
 
Had a similar bug with my implementation of GJK. Really damn hard to track down.
posted by The Power Nap at 8:58 PM on July 9, 2017


This bug only happens when your table size is over 1/2 the size of your computer's entire memory range. It's not really a logic bug like the ones John Bentley was writing about. It's been practically fixed by everyone moving to 64-bit operating systems. It doesn't surprise me that Google ran into it in the early '00s, however, because they're always pushing the boundaries of big data.
posted by Harvey Kilobit at 9:10 PM on July 9, 2017 [8 favorites]


I'm pretty sure another way to fix it would be to divide the high and low by 2 before summing them (that is, distribute the multiplication). This will definitely keep you under MAX_INT on whatever architecture you're using.

I had an issue like this show up on an Arduino project. I don't remember if the max value was 255 or 65k, but I had to distribute the division in so I could make the sum correctly without overflow.
posted by Xoder at 9:18 PM on July 9, 2017 [3 favorites]


Science needs to be reminded occasionally that it does NOT have anything "perfectly right" (and to claim so turns it into Religion). Science is all about finding What's Wrong and making it Less Wrong.

I think of it as Science being where Zeno's Paradox applies... you keep getting closer and closer but never get exactly there. Still, anyone who claims that having something proven wrong means that a previous belief that was already proven wrong is actually right is an idiot or a liar or both.
posted by oneswellfoop at 9:19 PM on July 9, 2017 [3 favorites]


IMO, this is fairly easy to detect with static analysis. The problem is that most proofs treat ints as unbounded integers when they are definitely not. So so when you claim that mid = (low + high) / 2 leads to a median value, you're not properly modelling the int datatype. Which would be ugly and inelegant, therefore academics are loathe do to so for their own algorithms, no matter how much public utility would be gained.

If you want to have some fun, even a stupid dumb regex can find bugs: try grabbing the coreutils git repo and running git grep "/ 2" sort.c.
posted by pwnguin at 9:33 PM on July 9, 2017 [7 favorites]


oneswellfoop, I'm not sure I agree that your sentiment is applicable to this situation. The bug described here lies at the intersection of "defined behavior" in C, the details of how CPUs execute certain instructions, and a mostly-good-but-not-perfect mapping between what a programmer intends to describe with their program and the actual program that is specified. There are ways to formally specify a program so you can be mathematically certain that it operates in a defined way, but it's just not often done, so we build crutches to make such mapping less uncertain. The Java exception is a great example of just such a crutch: when the details of the CPU prevent the program from operating as the programmer intended, notify the programmer and stop computation. There's no "science" being done, as such, just practical mechanics.
posted by TheNewWazoo at 9:36 PM on July 9, 2017 [23 favorites]


I'm pretty sure another way to fix it would be to divide the high and low by 2 before summing them (that is, distribute the multiplication).

Nope. Gives the wrong answer if both low and high are odd.
posted by It's Never Lurgi at 9:37 PM on July 9, 2017 [7 favorites]


Gives the wrong answer if both low and high are odd

But the index of mid being off by one or two would hardly matter, in an array large enough to be vulnerable to this issue.
posted by thelonius at 10:39 PM on July 9, 2017


But the index of mid being off by one or two would hardly matter, in an array large enough to be vulnerable to this issue.

Wouldn't you potentially end up with an infinite loop if your indices were 1 and 3 though?
posted by NMcCoy at 11:31 PM on July 9, 2017 [3 favorites]


I thought this was a classic gotcha question in programming interviews for the last decade? Presumably in other contexts I guess. But I know I've heard of it despite being only marginally connected to stuff where these concerns matter much.
posted by mark k at 11:35 PM on July 9, 2017


I thought this was a classic gotcha question in programming interviews for the last decade?

Article is 11 years old ... so, yeah, this "bug" has been making younger programmers feel smug towards graybeards for a while.

The reason it's tough to spot, even for experienced C programmers, is that you read "(low+high)/2" and mentally substitute the concept "the median of low and high", without working through what actually happens in the computation. It's a mathematical thought-terminating cliché.

The simple solution is to find the distance between low and high, and the median is located at half that distance from either endpoint.

median = (high-low)/2   + low
posted by Harvey Kilobit at 11:56 PM on July 9, 2017 [7 favorites]


Science needs to be reminded occasionally that it does NOT have anything "perfectly right" (and to claim so turns it into Religion). Science is all about finding What's Wrong and making it Less Wrong.


This is programming, not science. Science is based on observation/ experiment, this is something someone created.
posted by fshgrl at 12:15 AM on July 10, 2017 [8 favorites]


Code with integer division has bugs, news at 1011.
posted by lawrencium at 12:45 AM on July 10, 2017 [18 favorites]


I went to a presentation by Joshua Bloch at JavaOne once. His book "Java Puzzlers" had just come out, and the presentation was examples from it. The examples were all, what is wrong with this source code? Why is it wrong? And, every one of them, I wanted to say, there's not a got-damn thing wrong with that code. But there always was! It did indeed increase my humility, or my fear of software.
posted by thelonius at 3:15 AM on July 10, 2017 [5 favorites]


Close enough for government work...

(insert link of the jet when crossing the international dateline, flipped upside down)
posted by sammyo at 5:38 AM on July 10, 2017


Oh, yes, watching the audience reaction to Bloch (filled room 2-300 experienced programmers) was just hilarious. The first couple you could feel the arrogance, "that's just obvious" just audibly deflate when he showed the glitch.
posted by sammyo at 5:43 AM on July 10, 2017 [2 favorites]


Presumably this would not happen in javascript as numeric values are broken all to hell anyway.
posted by Artw at 6:41 AM on July 10, 2017 [6 favorites]


Oh hey Josh Block. I sort of know him and his brother, we all worked at the same company at different times and then they both worked with my wife at a different company so our paths have crossed multiple times.
posted by octothorpe at 7:21 AM on July 10, 2017 [1 favorite]


Presumably this would not happen in javascript as numeric values are broken all to hell anyway.

Javascript has no int type, which seems insane to me....but it converts floating point numbers to an internal 32-bit integer on certain occasions, like if you bit-shift a number, or so I have read.
posted by thelonius at 7:50 AM on July 10, 2017


Wouldn't you potentially end up with an infinite loop if your indices were 1 and 3 though?

I don't think so, but of course I could be wrong. 1/2 = 0, 3/2 = 1, so it would use 1 as the mid...then, how would the code in the blog post get low=1 and high = 3 again from that?
posted by thelonius at 7:59 AM on July 10, 2017


IMO, this is fairly easy to detect with static analysis.
Overflow of integers known at compile-time is easy to detect, but presumably your program's only multi-billion-entry sorted arrays are going to be built up with a size which varies from run to run.

Does your analyzer scream at the user about any code where two integers are added and it can't prove that the sum won't overflow for any possible combination of inputs? Nobody would ever bother to read through all the output.

I did balk as soon as I saw "wait, he's assigning a size() to int??", and if (like me, usually) you're compiling C++ on a 64-bit system then size_t downconversion is indeed easy for compilers and analyzers to detect, merely fixing that is practically sufficient (2^63 entries "ought to be enough for anybody"!). But if you're writing C++ on a 32-bit system or if you're writing Java (Where array sizes and indices *are* 32 bit signed ints? That's going to be stuck that way forever??) how do you catch this automatically?
posted by roystgnr at 8:04 AM on July 10, 2017 [1 favorite]


Xoder: I'm pretty sure another way to fix it would be to divide the high and low by 2 before summing them
It's Never Lurgi: Gives the wrong answer if both low and high are odd.
thelonius: But the index of mid being off by one or two would hardly matter, in an array large enough to be vulnerable to this issue.
NMcCoy: Wouldn't you potentially end up with an infinite loop if your indices were 1 and 3 though?

Not immediately at this step, but you potentially enter an infinite loop once high and low are both odd and equal.

The next step after low=1, high=3 (and mid=1) isn't a problem, because one of three things happens:
If key=a[1], it returns 1 and exits
If key>a[1], it sets low to 2
If key<a[1], it sets high to 0 (and will subsequently exit the while loop and return -2, indicating the key was not found)

Xoder's suggestion fails if you get to a state where low and high are odd and equal. E.g., if you get to low=5, high=5, then Xoder's formula gives mid=4. If we imagine a[5]=key (and the routine should return 5), this formula instead compares key and a[4], finds key>a[4], and sets low to 5 (which it already was), entering an infinite loop.
posted by DevilsAdvocate at 8:12 AM on July 10, 2017 [4 favorites]


"This bug only happens when your table size is over 1/2 the size of your computer's entire memory range. It's not really a logic bug like the ones John Bentley was writing about. It's been practically fixed by everyone moving to 64-bit operating systems."

Isn't a java int still 32 bits on a 64-bit system?
posted by floppyroofing at 8:40 AM on July 10, 2017


Does your analyzer scream at the user about any code where two integers are added and it can't prove that the sum won't overflow for any possible combination of inputs? Nobody would ever bother to read through all the output.

We already ignore most static analysis output, what's a few more lines to ignore in the greater context? It seems possible to mechanically rewrite the code without integer overflow, so as long as we're cool with replacing programmers with robots, it could work. Unless there's some bullshit usecase where integer overflow is desired?

But a more interesting possibility is whether javac / clang / golang already fix these silently. Or if the optimizers will rewrite your fixed expr back into broken bytecode. Sadly I don't have the time to cook up examples at the moment and see what comes out.
posted by pwnguin at 10:33 AM on July 10, 2017


Javascript has no int type, which seems insane to me....but it converts floating point numbers to an internal 32-bit integer on certain occasions, like if you bit-shift a number, or so I have read.

I believe it converts it to a 32-bit int, dropping MSBs past the 32nd place, then shifts, then puts it back in a double float, yes. Because of course it does.
posted by atoxyl at 11:49 AM on July 10, 2017 [4 favorites]


The worst is when compilers decide integer overflows are undefined behavior, and thus compile out checks for integer overflow because "who knows what would happen".
posted by effugas at 12:49 PM on July 10, 2017 [3 favorites]


fshgrl: This is programming, not science. Science is based on observation/ experiment, this is something someone created.

"Our experiments show that the program gives the correct results (n = 50, p < 0.05)."
posted by clawsoon at 6:49 PM on July 10, 2017 [1 favorite]


"This is programming, not science. Science is based on observation/ experiment, this is something someone created."

It's not that clear. Realistically, the software that runs on your phone on your laptop is much too complicated to analyze completely, so often observation and experiment really is the right approach.
posted by floppyroofing at 7:12 AM on July 12, 2017


Oh, like RegEx?
posted by Artw at 6:21 AM on July 14, 2017


« Older "There are plenty of other fishy names in the...   |   The best teacher is an entertainer Newer »


This thread has been archived and is closed to new comments