Join 3,561 readers in helping fund MetaFilter (Hide)


I can put whatever I want here. It doesn't have to compile.
February 9, 2010 2:05 PM   Subscribe

A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. A frank account of the technical, social and commercial challenges encountered while turning an academic research tool into a business.
posted by ltl (43 comments total) 28 users marked this as a favorite

 
Sending people to a trial dramatically raises the incremental cost of each sale. However, it gives the non-trivial benefit of letting us educate customers (so they do not label serious, true bugs as false positives) and do real-time, ad hoc workarounds of weird customer system setups.

Heh. Welcome to the world of Enterprise Sales™. AKA Our product frequently breaks and/or outputs indecipherable reports, so we come and read the tea leaves for you.

What's funny to me is the whole babe-in-the-woods attitude of the whole article, like somehow these are new problems and that this paper isn't describing the day job of thousands of product managers and sales engineers at mid-size software companies around the globe.

In support of the article, it is well-written and comprehensive - the author sure seems like he found just about every single problem out there.
posted by GuyZero at 2:19 PM on February 9, 2010


"Why is it when I run your tool, I have to reinstall my Linux distribution from CD?"

This was indeed a puzzling question. Some poking around exposed the following chain of events: the company's make used a novel format to print out the absolute path of the directory in which the compiler ran; our script misparsed this path, producing the empty string that we gave as the destination to the Unix "cd" (change directory) command, causing it to change to the top level of the system; it ran "rm -rf *" (recursive delete) during compilation to clean up temporary files; and the build process ran as root. Summing these points produces the removal of all files on the system.
Heh.
posted by delmoi at 2:24 PM on February 9, 2010 [8 favorites]


This is like the worst of Daily WTF. Some of these examples make me want to cry.
posted by grouse at 2:44 PM on February 9, 2010


500 Internal Server Error

I think we found a bug.
posted by Blazecock Pileon at 2:47 PM on February 9, 2010


Fascinating reading. Great first post, ltl.

And yeah, as rusty as my programming skills are, some of those examples of code errors that their tool finds made me want to cry as well. So did many of the examples of political and cultural errors that their sales force found, and had to learn how to work around.
posted by FishBike at 2:55 PM on February 9, 2010


I thought it was fascinating. Thanks for the post!
posted by Kwine at 2:55 PM on February 9, 2010


5:55 EST MeFites agree: Post is fascinating.
posted by Kwine at 2:56 PM on February 9, 2010 [1 favorite]


Some of these examples make me want to cry.
int i[4];
...
i[4] = 0;
From the customer: "ANSI lets you write 1 beyond the end of the buffer. We'll have to agree to disagree."

Cry from laughing so hard I guess.
posted by GuyZero at 3:01 PM on February 9, 2010


You cannot often argue with people who are sufficiently confused about technical matters; they think you are the one who doesn't get it. They also tend to get emotional. Arguing reliably kills sales.

Heh.
posted by delmoi at 3:09 PM on February 9, 2010 [5 favorites]


I totally enjoyed this article (and, oddly, it didn't cross my mind that Metafilter might also). There was a good quote about how it could be difficult when you get someone who doesn't understand the issue and is too defensive to admit it.
posted by These Premises Are Alarmed at 3:11 PM on February 9, 2010


Oh, yeah, exactly what delmoi just quoted.
posted by These Premises Are Alarmed at 3:12 PM on February 9, 2010


As someone developing tools for dynamic multithreaded bug finding at a startup that is weeks from a Beta launch this article is of particular interest. Thanks.
posted by lucasks at 3:12 PM on February 9, 2010


Yes, accidentally having your build process run "rm -rf /" is a bit awkward. As a software toolsmith dealing with versioning, builds and bug trackers by the hour, I'll note that I've seen similar jolly japes in a few workplaces.
posted by mdoar at 3:25 PM on February 9, 2010 [1 favorite]


There are a bunch of gems here:

Checking code deeply requires understanding the code's semantics. The most basic requirement is that you parse it. Parsing is considered a solved problem. Unfortunately, this view is naïve, rooted in the widely believed myth that programming languages exist.
posted by Joakim Ziegler at 4:01 PM on February 9, 2010 [1 favorite]


void fail() {
fail();
};
posted by tommasz at 4:02 PM on February 9, 2010 [2 favorites]


Some of these examples make me want to cry.

int i[4];
...
i[4] = 0;

From the customer: "ANSI lets you write 1 beyond the end of the buffer. We'll have to agree to disagree."

Cry from laughing so hard I guess.


Interestingly, even a mistake as dumb as this didn't come from nowhere. Probably from a misreading of a statement like this:

To facilitate array access by incrementing pointers, the Standard guarantees that in an n element array, although element n does not exist, use of its address is not an error—the valid range of addresses for an array declared as int ar[N] is &ar[0] through to &ar[N]. You must not try to access this last pseudo-element.

From The C Book.
http://publications.gbdirect.co.uk/c_book/chapter5/pointers.html
posted by Bobicus at 4:09 PM on February 9, 2010 [1 favorite]


I would actually cut the guy more slack for an error based on something completely made up. Such a mis-reading of the spec is crazy egregious as anyone who takes two seconds to read it will draw the correct conclusion. But maybe he was just a Pascal die-hard.
posted by GuyZero at 4:13 PM on February 9, 2010


Interesting read, but the tone is a bit too whiny ("our product didn't work in the real world because apparently we had naive expectations, and so we kept having to fix it") instead of say, realistic ("our system of doing demos was a priceless opportunity for us to learn how to make our product actually work for our customers.").

To me it's just astounding that these guys went out to sell their product with no idea what ClearCase is.
posted by fleacircus at 4:18 PM on February 9, 2010 [3 favorites]


I was surprised it was still being used actually. I haven't seen it in use since, ugh, '95? It's up there with Open Look in my mind.
posted by GuyZero at 4:21 PM on February 9, 2010


Many things from back then still being built and sold. Not every part of the software world reinvents the wheel every 5-10 years. But there exists a class of programmer who can't understand the world beyond latest tools + language flavor of the month + small projects + maintenance is someone else's problem.
posted by fleacircus at 4:41 PM on February 9, 2010


Re the quote delmoi pointed out:
Arguing reliably kills sales.

or "no points for calling the potential customer stupid"
posted by cmj at 4:59 PM on February 9, 2010


This article has inspired me! I'm going to start a company that sells static analysis tools to find bugs in companies that sell static analysis tools to find bugs in C programs.
posted by qxntpqbbbqxl at 5:00 PM on February 9, 2010


Yeah, I used findbugs extensively. It's a pain to wade through the false positives, but ultimately worth it.
posted by nightwood at 5:30 PM on February 9, 2010


The major takeaway from this, for me, is that development in C absolutely sucks, god I remember how bad it was and I never did anything that complicated either.

I ran into all sorts of problems when it came to integrating off the self code. I remember once two different libraries wanted to use two different versions of the standard C library or something, so all the function names collided (or something like that) I remember spending days trying to get this library to compile on Linux and I had to go through tons of files by hand and change the header file references. I can see how people ended up with massive build systems, but it just seems like build systems in general are a total, unnecessary hack used to paper over the shortcomings of C/C++.

Just recently I was taking some massive java program with lots of various components and I wanted to use some of the components as a web process. Building was a simple as
find . | grep .java$ | xargs javac or something and then adding the output directories to the classpath of the server. (Technically I should have packed things up in a WAR or something, but whatever)
Many things from back then still being built and sold. Not every part of the software world reinvents the wheel every 5-10 years. But there exists a class of programmer who can't understand the world beyond latest tools + language flavor of the month + small projects + maintenance is someone else's problem.
Well, I was under the impression that these guys started some time ago. I think, at least from reading the article, they probably picked the wrong way to go about actually getting people to use this. It would have made more sense to release it as a patch for GCC or something. Of course, they wouldn't have made much money this way, but the path they went on seems like torture :P. Okay, checking code using existing build systems seems problematic, so, don't do that. Focus on new stuff, which is all anyone actually cared about anyway. They didn't want to go back through existing code and find theoretical problems that weren't actually causing problems.
posted by delmoi at 6:26 PM on February 9, 2010


Now we need a counter-article showing why working in the tech industry is virtuous/worthwhile. I mean, what would a high-school student prospectively interested in the hard sciences make of the tone of this article?
posted by polymodus at 7:15 PM on February 9, 2010


The major takeaway from this, for me, is that development in C absolutely sucks, god I remember how bad it was and I never did anything that complicated either.

But it goes REALLY FAST ( unless it explodes -- then it explodes really fast ).
posted by mikelieman at 7:16 PM on February 9, 2010


Yeah. Garbage collection and multi-threading/distributed processing are solved problems on modern languages. I think the big problem is that there isn't a functional language that allows developers to just =get it=, the syntactic sugar isn't there. Part of this problem is that they've been mangling their brains with C-like languages for decades*, part of it is that functional languages make very, very little sense to casual experimenters and n00bs the way they're currently set up. You've got an issue when the most readable example of the breed is LISP, and only because LISP metaprogrammed itself into being a functional language. The learning curve is more like a brick wall.

(And let me rant on the old saw that "C/C++ offers better performance." Yes, it does, but any modern application is so massively overburdened by poorly written libraries, frameworks, plug-ins, tie-ins and economy-sized cruft, there comes a point where you've stuck together too many legos to make the thing work right. Optimization is a meaningless joke. Start over with something that has what you need mostly baked-in, or that provides a better set of tinkertoys to hook stuff together, even if it does any given algorithm expressible in less than eight lines 10% slower.)
posted by Slap*Happy at 7:23 PM on February 9, 2010


Slap*Happy. Are you making a joke? You are certainly correct that garbage collection is a solved problem, but to assert that multi-threaded and distributed processing are seems strange. Haskell's cool, but that doesn't make it a panacea for writing correct, performant, and scalable multithreaded applications. Simple googling will reveal that efficiently extracting the parallelism inherent in a side effect free language like haskell is anything but solved. This is an active area of ongoing research.
posted by lucasks at 7:38 PM on February 9, 2010


Not solved, but a hell of a lot further down the track than most of the iterative languages, who at best need to rely on stuff like Apple's GCD, and at worst descend into an impenetrable morass of lock/unlocks.
posted by Slap*Happy at 8:00 PM on February 9, 2010


I suspect that if someone is talking about distributed processing and functional languages, that person is more likely to be thinking of Erlang than of Haskell.
posted by kenko at 8:00 PM on February 9, 2010 [1 favorite]


Sorry, too many of my friends are Haskell obsessed. It's just where my mind goes when I hear functional language.
posted by lucasks at 8:29 PM on February 9, 2010


Well, I was under the impression that these guys started some time ago. I think, at least from reading the article, they probably picked the wrong way to go about actually getting people to use this. It would have made more sense to release it as a patch for GCC or something. -- delmoi

They helped make their fame by using their tool on the Linux code base.
posted by eye of newt at 9:30 PM on February 9, 2010


Nice comments from several of you on the tone of the article. I took a class from Dawson Engler a few years back and reading it was like listening to him lecture. He's a bright guy with a lot of good ideas, and boy, does he know it. It was frustrating...I wanted to like him so badly and he had so much cool stuff to say but he was one of the more egotistical people I've ever studied under.

That being said, at the time of the class he was working on some really cool stuff and it looks like it panned out. I bet he is using a lot of these anecdotes as examples in his classes now.
posted by crinklebat at 9:41 PM on February 9, 2010


Functional "stuff" is definitely getting put into more modern languages like Scala, Ruby, Erlang, etc. (I'd add clojure but that's already functional).

Haskell is an interesting case, it seems like that's the language people add new features to first, then they filter down to more mainstream languages.
posted by delmoi at 10:50 PM on February 9, 2010


I also disagree that garbage collection is "solved". As so would you if you'd even seen what happens to a .net process with 16 gigs of managed memory when a gen2 fires. A couple of seconds of pause every once in a while isn't that big a deal.... except on a cache server doing 30000 operations a second. (yes, having that much managed memory in that context was a bad idea and we no longer do that. but if garbage collection was "solved" it would not be an issue).

Parallelism is also FAR from "solved", and developments and libraries in that field are the hot topic nearly everywhere I look in the industry.

"Addition". Now there's a solved problem.

Mostly.
posted by flaterik at 10:59 PM on February 9, 2010 [1 favorite]


s so would you if you'd even seen what happens to a .net process with 16 gigs of managed memory when a gen2 fires.

Tuning a java process to get the right garbage collection algorithm with the right parameters set up in a situation like that can be made to work, but it requires the sacrifice of a virgin and the vigorous waving of several dead chickens over the server.
posted by DreamerFi at 11:54 PM on February 9, 2010 [1 favorite]


I lost rather a lot of sympathy/respect for them at the point where the recursive file deletion is mentioned. I'd expect even a novice to instinctively get nervous (you know, that gut feeling that a bit of code gives you when it 'feels' iffy?) and apply safety checks to the file path.
posted by malevolent at 12:29 AM on February 10, 2010 [2 favorites]


C's still in heavy use for deeply embedded stuff because we're constrained in code size, RAM size and time. Not everyone is writing desktop applications, as the Coverity people found out when they ran into the dialects of C used by compilers targeting embedded platforms (I did chuckle at that).
posted by pw201 at 2:11 AM on February 10, 2010 [1 favorite]


malevolent: "I lost rather a lot of sympathy/respect for them at the point where the recursive file deletion is mentioned. I'd expect even a novice to instinctively get nervous (you know, that gut feeling that a bit of code gives you when it 'feels' iffy?) and apply safety checks to the file path."

If you haven't had a build process go beserk at some point, then you haven't been programming long enough. Being forced to integrate your tool into random, often poorly written build processes makes something like this inevitable somewhere down the line: I very much doubt that it was the tool writers that insisted that the build process run as root, which is madness in and of itself.

delmoi: "Well, I was under the impression that these guys started some time ago. I think, at least from reading the article, they probably picked the wrong way to go about actually getting people to use this. It would have made more sense to release it as a patch for GCC or something. Of course, they wouldn't have made much money this way, but the path they went on seems like torture."

Making money is (almost) always hard work. This should not be a surprise.

There are very few companies making any real money out of code-analysis tools, it's a very hard sell. That this bunch are managing to make real money selling to real companies (not just from government sugar daddy contracts, although they've had of few of those IIRC) and employ a sizable number of people in the process is an achievement all by itself.
posted by pharm at 3:49 AM on February 10, 2010


I'm surprised that so many people found the tone of the article to be whiny or similar. For some reason, it came across to me more like self-deprecating humor, as though we should be laughing along with the academics as they are naively surprised by things that happen in the business world.
posted by FishBike at 5:43 AM on February 10, 2010


Haskell is an interesting case, it seems like that's the language people add new features to first, then they filter down to more mainstream languages.

That's pretty much its goal.

We had a couple of meetings with Simon Peyton-Jones last week; he's quite an entertaining guy.
posted by Slothrup at 6:58 AM on February 10, 2010


Really enjoyed this. Thanks.
posted by yerfatma at 7:51 AM on February 10, 2010


> I think the big problem is that there isn't a functional language that allows developers to just =get it=, the syntactic sugar isn't there. [...] You've got an issue when the most readable example of the breed is LISP...

Slap*Happy, I'm curious what you mean. It's always been my impression that syntax is the least of anyone's problems with functional programming, and when syntax is an issue, it's always always Lisp syntax that people love to hate.

Have you ever tried using Scala? It may be the language missing-link language you're imagining. It looks a lot like a first cousin of Java or C#, and IMHO it's the best Java-like language out there, but that's only half the story. Scala also has everything you'd expect from a functional language in the ML/F# family. The only thing "missing" is full (Hindley-Milner) type inference, because it doesn't play well with subclassing, but I don't think that's necessarily a weakness.
posted by shponglespore at 9:17 AM on February 15, 2010


« Older Neuroscience explained using LOLcats (SLLJ)....  |  Indie rock darling Liz Phair s... Newer »


This thread has been archived and is closed to new comments