Sequoia Voting Systems
October 20, 2009 9:46 PM   Subscribe

Sequoia Voting Systems seem to have inadvertently released SQL code from their voting machines. And people are saying that the code is incriminating. People who have analyzed the code claim that it contained code designed to control or influence the outcome of an election.

The code and the technical details are here for those inclined to look at it.

Torrent for the code in question.

The story is being discussed on the usual suspects for political stories related to programming.
posted by idiopath (105 comments total) 14 users marked this as a favorite
 
in violation of a bunch of clauses in the FEC voting system rulebook banning interpreted code, machine modified code and mandating hash checks of voting system code.

Someone at the FEC must have gotten into one to many flame wars with python users.

Is SQL really against the rules? That's interesting, seems like it would weed out a lot of hack programmers who can't write their own database engines or figure out BerkelyDB. It would of course make things more difficult to hack for hackers who can't figure out how to express themselves in machine code that they can inject (through a buffer overflow or whatever)
posted by delmoi at 9:53 PM on October 20, 2009


Impossible, can't happen, tin-foil delusions by bitter, paranoid left-wing radical bloggers in momma's basement and besides the Democrats won so that proves it was all fine all along and also this is old news anyway.

(God I just hope someone finally burns for this... other than, you know, the voters.)
posted by rokusan at 9:53 PM on October 20, 2009 [9 favorites]


I read the links, and frankly I'm not smart enough to figure out what the hell they're talking about. But man, those DailyKos folks sure are het up about this thing that has to do with voting machines, aren't they? Who'd a thunk it?
posted by dersins at 9:58 PM on October 20, 2009 [1 favorite]


I haven't seen all the code example, and I'm not going to torrent it. But the examples they give on the website don't mean anything. They look like column comments, table definitions, and a couple index create statements. Nothing at all damning, and something almost every database contains.

I expected some T-SQL. Maybe someone who has downloaded the "code" can find some better examples.
posted by sbutler at 10:00 PM on October 20, 2009 [2 favorites]


delmoi: "Is SQL really against the rules?"

I think the main issue is that you can use self modifying code to make a program that seems to do one thing into one that does another thing entirely. And when you use SQL stored procedures and interpreted code, it makes that sort of obfuscation easier to do.

Right now, people are claiming to see red flags showing potential backdoors for vote theft. Now that this is out in the open and available as a torrent, we should soon see some SQL experts' opinions.
posted by idiopath at 10:02 PM on October 20, 2009


So the problem here is, they used CREATE TABLE sql statements? Which are not allowed, because they are "interpreted?" I'm just not sure that I'm understanding this...
posted by capnsue at 10:02 PM on October 20, 2009


Someone at the FEC must have gotten into one to many flame wars with python users.

Is SQL really against the rules?


These are supposedly data files, and supposedly you are not allowed to have interpreted code within the data files. But I'm not an election law expert or a db expert.
posted by empath at 10:02 PM on October 20, 2009


I'm gonna go out on a limb here and say this is much ado about nothing. They claim that the data was vandalized but in the discussion, someone posted that they were able to restore the file to an MS SQL server with no problem. I think these guys are just publicity hounds.
posted by empath at 10:06 PM on October 20, 2009


Nothing at all damning, and something almost every database contains.

Okay, there might be nothing. But considering this story was posted by a nonprofit dedicated to/named Open Source Voting, isn't it probable they know more about what is typically allowed within this sort of situation than you do?

Clearly they have an agenda and stake in this, but it seems really unlikely this giant hubub would be caused over something that is 100% mundane and to be expected.
posted by Solon and Thanks at 10:07 PM on October 20, 2009


Yeah, this guy just doesn't know what he's talking about.

And 'paying through the nose' is not paying $105, in my book.
posted by empath at 10:08 PM on October 20, 2009 [2 favorites]


But man, those DailyKos folks sure are het up about this thing that has to do with voting machines, aren't they?

Isn't that one of the sites where the topic of voter fraud is forbidden?

I might have my lefty web rules table mixed up.
posted by rokusan at 10:08 PM on October 20, 2009


(*Uh, for a given value of "you," = "layperson with average knowledge of databases." For all I know you could be a leading expert on this, sbutler.)
posted by Solon and Thanks at 10:09 PM on October 20, 2009


giant hubub

One post on DailyKos that didn't even make the rec list doesn't count as a giant hubbub.

There's nothing here.
posted by empath at 10:10 PM on October 20, 2009


Oh. Yeah, I got the wrong impression of how many sites were talking about this.

Plus: "First goal is to prove that Sequoia did in fact vandalize the data files by stripping the MS-SQL headers - if so that will affect other public records inquiries against Sequoia." - so, you're right, fail.
posted by Solon and Thanks at 10:15 PM on October 20, 2009


One post on DailyKos that didn't even make the rec list doesn't count as a giant hubbub.

AND WHAT IS SLASHDOT THEN?

bwahaha
posted by iamabot at 10:18 PM on October 20, 2009


Okay, there might be nothing. But considering this story was posted by a nonprofit dedicated to/named Open Source Voting, isn't it probable they know more about what is typically allowed within this sort of situation than you do?

I'm sure they know what's allowed. But it doesn't appear they know T-SQL. The examples I see on the site simply aren't what anyone would consider code, with the exception of some very routine DDL statements. On another page they make a big deal out what what looks like, to a programmer, high level logging statements. Personally, I want my voting system to keep a record of all changes made.
posted by sbutler at 10:18 PM on October 20, 2009 [1 favorite]


UPDATE outrage
SET criticality = 'kerfluffle'
WHERE vendor = 'sequoia' AND
type = 'leak' AND
humandate = '2009-10-21' AND
criticality = 'hoppitamoppita'


Can someone optimize this for me? I'm out of AskMes.
posted by adipocere at 10:23 PM on October 20, 2009 [27 favorites]


AND WHAT IS SLASHDOT THEN?

Didn't notice that, but I think most of the commenters agree that it's just a bunch of Create statements.
posted by empath at 10:31 PM on October 20, 2009


oh shit creationism in the voting machine, NOW i understand.
posted by empath at 10:31 PM on October 20, 2009 [1 favorite]


Ok, having scanned the strings output for a few minutes, the most suspicious block of text I've seen so far is this:
4621051
>'njm
}.dl\
<>

posted by gsteff at 10:34 PM on October 20, 2009


WTF... it looked fine in preview. I meant this:

>'njm
}.dl\
<?oO
jbr*2
9Fj{
]Agc@
;Gg(
KD2%
V:;K0aX0
42394
37982
42394
37982
RON PAUL
A007523-006
41879
22012
41879
22012
41879
22012
41879
22012
posted by gsteff at 10:36 PM on October 20, 2009 [1 favorite]


empath, it was the sarcasm, it makes me do things sometimes.
posted by iamabot at 10:37 PM on October 20, 2009


Fucking hell people, voting systems aren’t that hard.

Pencil, paper, independent electoral commission, scrutineers. It’s not that hard!!

Here, copy ours.
posted by wilful at 10:42 PM on October 20, 2009 [14 favorites]


One intersting subthread from the slashdot thread. In particular itwerx claimed to have found code to remap ID's. This is pretty much how you do a classic vote count swap between candidates.

On balance though, this still seems to be more about an interesting moment in technological politics than real election fraud at this point, despite the Kos hyperbole.
posted by idiopath at 10:42 PM on October 20, 2009


Okay, there might be nothing.

words to live by.
posted by philip-random at 11:00 PM on October 20, 2009


wilful: Pencil, paper, independent electoral commission, scrutineers.

Yes.
posted by robcorr at 11:10 PM on October 20, 2009


One intersting subthread from the slashdot thread. In particular itwerx claimed to have found code to remap ID's. This is pretty much how you do a classic vote count swap between candidates.

I'm going to guess that they did the "rotate ballot" function (the updates are to a "ballot header") naively, by swapping values in a fixed structure rather than just using random() (or a fixed offset) to randomly/iteratively sort the headers.

If anything, it's evidence they are doing some sort of auditing.
posted by orthogonality at 11:24 PM on October 20, 2009


Pencil, paper, independent electoral commission, scrutineers.

Scrutineers are fine when they bring you coffee and donuts, but otherwise kind of a pain.
posted by Chuckles at 11:25 PM on October 20, 2009


This appears to be an issue of breaking election law, in that they're using SQL code (which is most emphatically interpreted code, in direct violation of some statute or other.) But from what I've seen of the actual code itself (very limited), there hasn't been any demonstrated effort to build in hooks for fraud or any such nonsense.

The current code examples appear entirely benign.... hiding a candidate, for instance, is extremely common, if they're in a district you're not qualified to vote in. The same machines get used across a lot of districts, and they're going to be uploading the same raw election files, and then parsing out what each specific voter should see in his or her specific district.

So:

Yes, it looks like Sequoia probably broke the law;
No, there isn't yet evidence of stolen elections, or attempts at same.

My guess is that you probably won't see any such thing; fraudsters would modify the data after the fact, changing vote totals to suit their requirements, rather than building the fraud into the original database design itself. That makes it untraceable. If it's obvious in the original design, that's a huge finger pointed back at Sequoia. No matter how nefarious their actual plans might be, they'll have enough competence to make the changes in a less obvious way.
posted by Malor at 11:32 PM on October 20, 2009 [3 favorites]


Yes, as Malor says, it's very easy to modify data; the only "hook" required is update rights on the results table:

update results set candidate_id = ( select id from candidates where name = 'Guy I Want to Win' )
where candidate_id = ( select id from candidates where name = 'Guy I Want to Lose' )
and random() < .1;

The above, for some hypothetical database, changes 1/10 of votes for 'Guy I Want to Lose' into votes for 'Guy I Want to Win'. Assuming there is no logging, or logging is turned off, the change is difficult (though not impossible) to detect.
posted by orthogonality at 11:48 PM on October 20, 2009 [3 favorites]


There's a lot of people peering into a very muddy puddle & seeing what they want or expect to see. I found this statement enlightening in that regard:
re: jsissom: you were able to load the .BAK? How? What version?
There are 88 tables in the database. That's it. They deleted all of the stored procedures and triggers out of the database. They gave you exactly what you asked for - the data from the election - without any of their software. The $105 was probably the time it took to remove all the non-table objects from the database and backing it up.

So this is a data file, no source code included. Any strings you find are database pages that might have been used at one time by stored procs or triggers but are now marked as empty.

Sorry, I know you want a conspiracy or a story, but there just isn't one here.
posted by scalefree at 11:52 PM on October 20, 2009 [2 favorites]


Do they actually have any protections against SQL injection? I mean what heappens when someone votes for "bob smith /0x00'; delete from votes where candidate = 'actual winner' limit 2"

That could be a pretty big reason to avoid SQL.
posted by delmoi at 12:25 AM on October 21, 2009 [1 favorite]


It's pretty funny seeing the geniuses at Slashdot interpret a MS SQL backup file as some kind of special seekret code.
posted by Artw at 12:30 AM on October 21, 2009 [7 favorites]


wilful: Pencil, paper, independent electoral commission, scrutineers.

plus the lizard people.
posted by mokuba at 12:35 AM on October 21, 2009 [1 favorite]


delmoi--

I was about to correct you, but that's a legitimate question. Have we seen write-in support in voting machines?
posted by effugas at 12:51 AM on October 21, 2009 [1 favorite]


They appear... to have just vandalized the data as valid databases by stripping the MS-SQL header data off, assuming that would stop us cold. They were wrong. The Linux 'strings' command was able to peel it apart.

You can also peel it apart by opening Microsoft's GUI, clicking "restore database", and choosing where to put the files. But I'm sure the Linux 'strings' command is a lot more fun. Don't let me stop you now...
posted by problemspace at 12:53 AM on October 21, 2009 [2 favorites]


delmoi: sanitizing inputs is the front-end's responsibility; that said, I suspect write-ins go into a separate table, and probably aren't transcribed with escaped single quotes.

The fact that they seem to be using sprocs for much of the interface suggests that the actual vote may be inserted via a sproc as well, which (except in the case of 'execute immediate' and its cousins), serves to prevent most SQL injection. (You'd still want to escape the string, but odds are if you didn't you'd get recoverable failure, not breach.)

When I did a quick-and-dirty database for the Obama campaign (not a vote-recording db, of course), I just made sure all variables interpolated into SQL strings were of integral type. Not a number? Throw exception. This doesn't prevent illegal lookups (and I wasn't looking to do that, it was quick and dirty and intended for internal use), but it prevents SQL injection. Using sprocs makes injection even harder. (And had I had more than a week to do what I did, I'd have used sprocs or prepared statements.)
posted by orthogonality at 12:59 AM on October 21, 2009 [3 favorites]


Brad Wood's comments are worth reading:
I installed the latest version of SQL Server 2008 express and restored the backup to a database. At first it errored, but then I realized all it wanted was for me to specify a different file name for the two data files contained in the backup.

What's interesting is that the Sequoia Voting Project site originally accused them of "vandalizing" the database backup but later reneged on that admitting it was a perfectly valid backup file. That was a pretty strong charge and I think the Sequoia Project site needs to watch what they claim.

[...]

One last note-- I took a quick look at the [WRITEIN] table to see the kinds of names people put in by hand. 22 people voted for "MICKEY MOUSE", 15 for "NONE OF THE ABOVE", and 7 for "POOH". (Whinnie the?) Other notable writeins included "BONO", "OBAMA", "SANTA CLAUS", "NO PREFERENCE" (anyone heard of him?), "DAFFY DUCK", "DONALD DUCK", and "F**K REPUBLICANS". (I added the asterisks) Way to go, Riverside County-- you sure showed them. :)
posted by problemspace at 1:23 AM on October 21, 2009 [2 favorites]


What are they expecting to find? That the code available to everyone with a dataset from one of their machines would contain SET winner = "republican"?
posted by a robot made out of meat at 1:44 AM on October 21, 2009 [1 favorite]


(And had I had more than a week to do what I did, I'd have used sprocs or prepared statements.)

If prepared statements slows you down that much comparing to string interpolation, it's time to locate a better database API for your language.

22 people voted for "MICKEY MOUSE"

That's exactly the same number as voted for Donald Duck ("Kalle Anka") in the Swedish EU parliament election earlier this year. Co-incidence? I think not, and I'm going to run the official report through some random Unix commands to prove it.
posted by effbot at 2:01 AM on October 21, 2009 [2 favorites]


Pencil, paper, independent electoral commission, scrutineers.

I like it when the scrutineers say "arrrrrr".
posted by rokusan at 2:11 AM on October 21, 2009 [1 favorite]


This post is inexplicably missing the "scaremongering" and "delusions of publicity-seeking assholes" tags.
posted by cillit bang at 3:11 AM on October 21, 2009


Jim March: A big question is, from where you're looking in SQL 2005, can you view the source code from there and tell anything about how it's active? Is it in a pre-compiled state, or ready to go?

What does that even mean!?
posted by wrok at 3:44 AM on October 21, 2009 [1 favorite]


Is SQL really against the rules? That's interesting, seems like it would weed out a lot of hack programmers who can't write their own database engines...

Nice thought, but if I remember right, Diebold's systems run on Microsoft Access.
posted by rokusan at 4:15 AM on October 21, 2009 [1 favorite]


Agreed that the code is useless if they stripped out the stored procedures.

Self-modifying, dynamically loaded, or interpreted code is prohibited.

Just to get the data into the database you have to do an insert. I honestly don't understand how such an asinine provision could have been put into the spec to begin with.
posted by Civil_Disobedient at 5:28 AM on October 21, 2009


And THAT'S why Ron Paul does so bad in the polls. Keep lying to yourselves libertarians!
posted by mccarty.tim at 5:45 AM on October 21, 2009


Miss Grills solved the voting fraud problem we were having at Walter J. Patton by having us all put our heads down on our desks with our eyes closed and raising our hands for the candiddate we liked. Worked a treat.

Who wants to be Miss Grills?
posted by dirtdirt at 5:48 AM on October 21, 2009


Just to get the data into the database you have to do an insert. I honestly don't understand how such an asinine provision could have been put into the spec to begin with.

A quick look at the handbook says there is no such rule.

Once you get past the hysterical timecube nonsense, the allegation is the Sequoia could be using stored procedures to hide executable code from the software auditors and to bypass their checks for code changes between auditing and election day.

Of course, this pre-supposes that the auditors are too stupid to include stored procedures in their audit, but if you assume that, you can assume that they could get away with a million different ways to hide malicious code.

So at best, this whole story amounts to wannabe nerds (oo, you know the strings command!) getting worked up about the breaking of an arbitrary rule that they just made up and that (if it existed) would protect no one.
posted by cillit bang at 6:13 AM on October 21, 2009 [2 favorites]


When I did a quick-and-dirty database for the Obama campaign (not a vote-recording db, of course), I just made sure all variables interpolated into SQL strings were of integral type. Not a number? Throw exception. This doesn't prevent illegal lookups (and I wasn't looking to do that, it was quick and dirty and intended for internal use), but it prevents SQL injection.

OK, so I'm a total SQL noob, and I don't understand why SQL injection is such a scary bogeyman. A couple of years ago I scratched a system admin itch with a little VBS script that used dynamically generated SQL to get its job done, and one of the first things I put in it was this:

' A little function to generate properly quoted string
' literals for embedding in JET SQL statements - see rules at
' http://support.microsoft.com/kb/q147687/

Function q(s)
    s = Replace(s, "'", "''")
    s = Replace(s, "|", "' & Chr(124) & '")
    q = "'" & s & "'"
    if Left(q, 5) = "'' & " Then q = Mid(q, 6)
    if Right(q, 5) = " & ''" Then q = Left(q, Len(q) - 5)
End Function

A typical piece of code that builds and executes a SQL statement then looks like this:

conn.Execute _
    "UPDATE Students " &_
    "SET " &_
        "First_Name = " & q(firstName) & ", " &_
        "Last_Name = " & q(lastName) & " " &_
    "WHERE " &_
        "Student_Code = " & studentCode

noting that studentCode is guaranteed numeric, while firstName and lastName are text that's ultimately derived from user input.

Isn't SQL injection completely impossible as long as I remember to wrap my q() function around any string I interpolate into an SQL statement? Isn't q() equally trivial to code up in any other SQL-generating environment? Is the use of something like q() not absolutely standard practice everywhere? Why is this hard?
posted by flabdablet at 6:20 AM on October 21, 2009 [3 favorites]


What did we do when we were pretty sure this was happening?
What will we do now that there is evidence of malfeasance?
posted by Balisong at 6:33 AM on October 21, 2009


The thing about voting machines is that they must be extremely, transparently open. Open, and have the source code posted by the machine. Written even so an idiot can understand them, and even compile it themselves.

The thing is, the Glenn Beck brigade will decry anything they don't understand, and they have real understanding of nearly nothing. They see nefarious doings at the U.S. Censusof all things. If you think they aren't going to raise Stink Plus One when Beck finally has an excuse to declare voting machines as Evil Socialist Gizmos designed to steal democracy from red-blooded Americans, well, then you must be living in a reality a hell of a lot better than the one I'm living in. May I please move there next Tuesday?

No? In that case, we had better get to work on the most easily-understood voting machine software the world has ever knows. The thing is, this is actually possible. A voting machine program is far from rocket science. You don't need SQL for this, Java, or AJAX, or Flash. Make that sucker as primitive as possible in fact; that is the only way your average American will understand it. All it has to do is writing a voting record. A simple text file will do.

Ideally, the source code should be made available to the public, not just for inspection but to use as they wish. Using the same voting mechanism to elect a high school president as you do the President of the United States would do wonders for eliminating public mistrust of the process.

If, five hundred years in the future, after the singularity where we're all floating around as communal hive mind energy beings three heads who do most of our work on the Astral Plane, we have to return to our corporal form for long enough to stand in line to press a button on a 70's style green-screen CRT to select our OmniUberPrez, I think that would be exactly as complicated as the system should be allowed to get. (And would be awesome to boot.)
posted by JHarris at 6:45 AM on October 21, 2009 [8 favorites]


Are people joking about SQL injection? It is easily avoided.
posted by Edgewise at 6:48 AM on October 21, 2009


flabdablet,

Among other things, Unicode provides other ways of representing a ', and I don't think your code is Irish-compliant.
posted by effugas at 6:52 AM on October 21, 2009 [1 favorite]


Edgewise--

If it's so easily avoided, why is it so blisteringly common?
posted by effugas at 6:53 AM on October 21, 2009


*shrugs*

We all know how to make a secure voting machine...you print out simultaneously human and machine readable ballots. Do what you want in terms of fast aggregation, but the official final count comes from scanning the paper.

I mean, every kid figures this out, literally, I've heard high school kids come up with it. Aside from Tempest, it beats everything.
posted by effugas at 6:54 AM on October 21, 2009


Surely this is... (...) this is great news... for McCain! (...) This just proves he isn't a citizen! (...) blah blah blah.

If stored procedures are what they're worried about, don't let them know about insert triggers that could randomly not actually insert the data in question.

I agree that voting needs to be open sourced, but goddamn, let's not have the tinfoil hatters lead the charge on this, it makes us reasonable people look bad.
posted by mark242 at 7:00 AM on October 21, 2009


Isn't SQL injection completely impossible as long as I remember to wrap my q() function around any string I interpolate into an SQL statement?

Sure, in the same way that leaking memory is completely impossible in C as long as you remember to free everything you allocate. People make mistakes, especially when it's a tedious and repetitious task.
posted by Combustible Edison Lighthouse at 7:06 AM on October 21, 2009 [1 favorite]


Why is this hard?
Because it is a complicated situation with lots of edge cases. As effugas pointed out, are you sure that your code is Unicode compliant? How about handling embedded nuls? What about the other many possible failure modes? Some things are best left to the experts, rather than rolling your own.

In any case, using place holders and bound parameters is known to be safe for any data types and in many ways cleaner than string interpolation:
$db->prepare( <<"" )->execute( $first_name, $last_name, $student_code );
 UPDATE Students
 SET
   First_name = ?,
   Last_name = ?
 WHERE
   Student_Code = ?

(Your language may vary)
posted by autopilot at 7:23 AM on October 21, 2009


Unicode provides other ways of representing a '

There are assorted Unicode and other conventions for specifying the single quote character that JET SQL uses for a literal string delimiter without actually using that character, but as far as I know, no SQL interpreter will actually decode such a specification on-the-fly and accept it as a valid string delimiter. For example,

UPDATE Students SET FirstName = 'DanU+0027 WHERE studentCode = 10758;

is syntactically incorrect, and not the same thing as

UPDATE Students SET FirstName = 'Dan' WHERE studentCode = 10758;

However, I am keenly aware that I am arguing with the guy who figured out how to break the Internets, so if I'm due for a sound schooling, kindly school.

I don't think your code is Irish-compliant

firstName: Robert'); DROP TABLE Students;--
q(firstName): 'Robert''); DROP TABLE Students;--'
lastName: O'Flaherty
q(lastName): 'O''Flaherty'

In both cases, q(whatever) is a safe JET SQL string literal. And if you're using an SQL interpreter with string literal rules that are different from Microsoft's rules for JET SQL, finding out what those rules are and coding q() appropriately is (a) not hard (b) something that only needs doing in one place.

in the same way that leaking memory is completely impossible in C as long as you remember to free everything you allocate. People make mistakes, especially when it's a tedious and repetitious task

I fail to see how using something like q() as a standard convention for quoting strings interpolated into SQL statements can be fairly compared with the multiple subtle ways in which it's possible to get malloc/free wrong. I also fail to see how adopting any kind of standard convention for quoting strings interpolated into SQL statements could be construed as a "tedious and repetitious task" provided it's reasonably concise. Therefore, I fail to see how developers who write production code vulnerable to SQL injections deserve anything other than major arse kicking.
posted by flabdablet at 7:46 AM on October 21, 2009 [1 favorite]


(God, I can hardly stand to look at my post above with all the typos in it. Please keep in mind I was in a hurry AND was operating on three hours of sleep. Still am, in fact. You probably shouldn't read anything else I write either.)
posted by JHarris at 7:51 AM on October 21, 2009


If you think they aren't going to raise Stink Plus One when Beck finally has an excuse to declare voting machines as Evil Socialist Gizmos designed to steal democracy from red-blooded Americans....

we had better get to work on the most easily-understood voting machine software the world has ever knows.


You are so missing the opportunity here.

Glen Beck and his ilk getting paranoid and outraged and opposed to electronic voting would be the best possible thing to happen.

Ban them completely. Paper and pencils work fine. Problem solved.

It won't happen, though. The pro-electronic voting lobbies are all strong, strong (R) factions.
posted by rokusan at 7:53 AM on October 21, 2009 [3 favorites]


How about handling embedded nuls?

If you're using an SQL interpreter that allows embedded nuls within single-quote-delimited string literals, q() should work as-is. If not, there will be some documented way to escape them, and q() can be modified to suit. The Microsoft KB article I based my own q() on makes no mention of nuls being special - does Jet even allow them to exist inside text fields? I don't know.

What about the other many possible failure modes?

My point is that interpolating strings into an SQL interpreter's command line, while requiring care and attention to that interpreter's escaping conventions, is something that can easily be handled with one simple escape-generating function that just gets called from everywhere. It's not rocket science. Hell, it's barely computer science. As far as I can tell, SQL injection happens when developers either pay no attention at all to the escaping issue, in which case they deserve their arses kicking, or get an escaping function wrong, which is easily fixed. I still have no idea why something that strikes me as so easily dealt with is such a bogey.
posted by flabdablet at 8:02 AM on October 21, 2009 [1 favorite]


flabdablet, it is not absolutely standard practice everywhere because of the culture, primarily.

Flop open a Perl book and you'll see that the introduction to SQL mostly involves building up a SQL statement out of strings. Just concatenating strings. The naive approach is to have a list of blacklisted characters and either purge or (slightly better) escape them. Of course, you're playing whack-a-mole and looking for Little Bobby Tables.

As far as I know, the proper practice these days is to parameterize the inputs, either by special functions in the programming language which interface with the SQL database, or by doing stored procedures (about which I know nothing). You have to dig around, hard, in the Roth Win32 Perl book to find anything about parameterization. I suspect the same holds for PHP and perhaps, to a lesser extent, python.

For whatever reason, the culture just has not picked it up a lot. I was trying to explain the whole SQL injection thing to an old-school programmer a while back, so I cooked up a toy example in python. When I showed him and he grokked it, he went white.

I'm not sure how such a cultural problem could be solved, but it's endemic. It's the first thing I look for when I am reviewing some crap code cooked up by a student as a part-time project and I usually find it. I have written, in the past, code that did this because I did not know any better, and, being self-taught, I had nothing to go on but books and onlnie tutorials. I also did the naive approach of blacklisting characters.

Ultimately, you have to do some form of parameterization. That message isn't out there.
posted by adipocere at 8:03 AM on October 21, 2009 [6 favorites]


Pencil, paper, independent electoral commission, scrutineers. It’s not that hard!!

From this wiki on the Australian electoral system:

The Australian electoral system has evolved over nearly 150 years of continuous democratic government, and has a number of distinctive features including compulsory voting, preferential voting (known elsewhere as instant-runoff voting) and the use of proportional voting to elect the upper house, the Australian Senate.

Stupid Australian jerks with your sweet voting system.
posted by electroboy at 8:03 AM on October 21, 2009 [1 favorite]


I fail to see how using something like q() as a standard convention for quoting strings interpolated into SQL statements can be fairly compared with the multiple subtle ways in which it's possible to get malloc/free wrong.

It depends on how complex the system is. What if some values come from another layer already encoded? Then you have to remember not to double-encode those. What if some values are meant to be inserted directly into the sql? You have to remember not to encode those at all. What if studentCode, which you know is guaranteed numeric, gets changed at some point and the guarantee no longer holds?
posted by Combustible Edison Lighthouse at 8:05 AM on October 21, 2009


using place holders and bound parameters is known to be safe for any data types

Better and better. Having now been suitably schooled, I will find out how to do it this way in whatever language I end up using the next time I need to whip up something like this.
posted by flabdablet at 8:19 AM on October 21, 2009


There are assorted Unicode and other conventions for specifying the single quote character that JET SQL uses for a literal string delimiter without actually using that character, but as far as I know, no SQL interpreter will actually decode such a specification on-the-fly and accept it as a valid string delimiter.

Yeah, so there's (one source of) the weirdness. The average n-tier stack has four or five different systems that can transform Unicode, and any of them can do a couple of sorts of canonicalization:

1) Alternate-' to ': Occasionally, you get other languages that have characters that are "close enough" to a ' to get turned back into a '.
2) Inefficient-' to ': UTF-8 is variable length. What happens when you get a non-minimal encoded ' in the stream?

Now, go ahead and throw in URLEncoding: %20 and the like. And realize you can have nested layers of URLEncoding. And you can do all of the above, all inside the multiple levels of URLEncoding.

It gets messy. Parameterization is really a much better way to go.
posted by effugas at 8:24 AM on October 21, 2009


Of course, you're playing whack-a-mole and looking for Little Bobby Tables.

Nice.
posted by rokusan at 8:37 AM on October 21, 2009


Ban them completely. Paper and pencils work fine. Problem solved.

No, paper and pencils don't work fine, because pencil marks are easily erased or smeared whether by intentional fraud or accident, and easily transferred from one ballot to another. What works is paper and indelible, permanent marker.

Even then, there are serious good things about electronic voting machines. With electronic voting machines, how many ballots are in Spanish? All of them. How many are in Tagalog? All of them. How many are in Arabic or Chinese or Japanese or Korean? Every ballot is in every language the ballot has been translated into. How many ballots are in larger type for people with vision problems? All of them. Also, electronic voting machines can be (though I haven't seen one yet that was) set up to alert the voter to overvotes and undervotes. "You've just tried to vote for two people for President; you can't do that. Press A to back and choose just one candidate, or press B if your intent was to cast a spoiled ballot (warning: spoiled ballots will be discarded)." or "You didn't vote for anyone for US Representative -- was that your intent? Press A if you intended to vote for nobody, or B if you wish to go back and cast a ballot in that race."

The solved problem that nobody uses is that then you use the electronic voting machine to print out a human-readable ballot that the voter puts into a sealed box, and that gets scanned optically later. Though I grudgingly admit that the ones with a paper record visible to the voted under glass are probably okay too.
posted by ROU_Xenophobe at 9:20 AM on October 21, 2009


Of course, you're playing whack-a-mole and looking for Little Bobby Tables.

I'm guessing that Randall Monroe will be able to retire on reprint royalties on that strip in programming text books for the rest of his life.
posted by empath at 9:24 AM on October 21, 2009 [3 favorites]


You have to dig around, hard, in the Roth Win32 Perl book to find anything about parameterization

Wouldn't that be more because the relevant documentation is in the DBI module?
posted by mikelieman at 9:27 AM on October 21, 2009


If you think they aren't going to raise Stink Plus One when Beck finally has an excuse to declare voting machines as Evil Socialist Gizmos designed to steal democracy from red-blooded Americans, well, then you must be living in a reality a hell of a lot better than the one I'm living in.

Good! Anything that raises the ire of enough Americans that we actually demand verified vote totals is an unqualified good. Implementation would be as simple as outlined above: vote on a machine, verify your paper ballot, use machine counts for first-run totals and human-verified, human-readable scanned paper ballot counts as final, authoritative tallies. Simple, straightforward, and not vulnerable to SQL injection, sneaky stored procedures, or any other hacks (as long as no court stops the count before it's complete).
posted by notashroom at 9:32 AM on October 21, 2009 [2 favorites]


Looking this over, this is fairly straightforward stuff. Why are we outsourcing this to private entities? Surely there must be a dev team in Redmond or Google that could pull this together in 3 mos. Make it open source and fund a non-profit committee for administration. Get a third-party to audit the code like I'm sure Sequoia is required to do and you're golden.
posted by geoff. at 9:43 AM on October 21, 2009


geoff--

See, that's the great thing about the automated ballot printers. You can have competition for the ballot printers, and competition for the ballot readers. Nothing needs to be open source, though it can be if you want. The vote can be emitted, and read, by whoever and whatever systems people want.

A human readable canonical printed format is the obvious solution. That nobody is seriously building this is amusing.
posted by effugas at 9:51 AM on October 21, 2009 [1 favorite]


I mean, every kid figures this out, literally, I've heard high school kids come up with it. Aside from Tempest, it beats everything.

Everything except paper ballots marked by hand. You don't even need code to do it, which negates the necessity of writing code for non-programmers to interpret (which I wouldn't think would be necessary anyway). As long as it's open qualified people can audit it easily on their own, but why bother with added complication and expense when we don't need to?
posted by krinklyfig at 9:55 AM on October 21, 2009


Nothing needs to be open source

What compelling reason is there not to mandate the source code be open?
posted by krinklyfig at 9:56 AM on October 21, 2009 [1 favorite]


The naive approach is to have a list of blacklisted characters and either purge or (slightly better) escape them

It would seem to me that the correct naive approach would be to find out what convention your SQL interpreter uses for delimiting string literals, and then make sure that any string literal you interpolate into a SQL statement is properly encoded according to that convention. If somebody can work out how and when to encode a URL, I don't see why working out how and when to encode an embedded SQL literal should be trickier.

What if some values come from another layer already encoded?

Then ur doin it wrong! Encoding is something that ought to happen only at the interfaces between layers, and the kind of encoding should be determined by the nature of that interface alone, not by the nature of interfaces deeper in or further out. The q() function I've been using as an example generates the encoding required between the layer where data items exist as VBS strings and the layer where they need to be included as string literals in SQL commands. The only way I'd need to use the same encoding in some other layer is if my own script itself had a command-line-like interface that allowed or required multiple data items to be serialized into a single parameter. In that case, the layer external to mine would need its own equivalent of q(), coded according to my serialization convention, in order to talk to me. It wouldn't need to know about the SQL convention unless I happened to be using the same one.

Then you have to remember not to double-encode those.

On the contrary. If I'm handed a pre-quoted string and told to stuff it into the database, then in it's gonna go, quotes and all - and I will be double-encoding it to make sure that what ends up inside the database is exactly what I was handed. People should be able to name their children Robert'); DROP TABLE Students;-- if they want to.

What if some values are meant to be inserted directly into the sql?

Then once again ur doin it wrong, and you need to re-think your layers.

What if studentCode, which you know is guaranteed numeric, gets changed at some point and the guarantee no longer holds?

Then you'd need to make sure that all the SQL statements that get studentCode interpolated into them get q() wrappers properly added. This is painful, but no more so than needing to re-jig the type information you'd be using with a bound-parameters approach.

In some languages, you could use a polymorphic q() that applied the right kind of conversion to any parameter type. Then you could just use it for all parameter interpolation, regardless.

Come to think of it, VBS is capable of this, so I probably deserve a minor arse-kicking for not doing it myself (I'm relying on implicit number-to-string conversion caused by using the & concatenation operator while building the SQL statement, which is slack).

Corrected version (name changed to suggest "parameter" instead of "quote"):

' A little function to generate properly quoted literals
' for embedding in JET SQL statements - see rules at
' http://support.microsoft.com/kb/q147687/
' Argument can be numeric or string.
' Result is always string.

Function p(x)
    if isNumeric(x) then
        p = x & ""
    else
        x = Replace(x, "'", "''")
        x = Replace(x, "|", "' & Chr(124) & '")
        p = "'" & x & "'"
        if Left(p, 5) = "'' & " Then p = Mid(p, 6)
        if Right(p, 5) = " & ''" Then p = Left(p, Len(p) - 5)
    end if
End Function

...

conn.Execute _
    "UPDATE Students " &_
    "SET " &_
        "First_Name = " & p(firstName) & ", " &_
        "Last_Name = " & p(lastName) & " " &_
    "WHERE " &_
        "Student_Code = " & p(studentCode)


On preview:

The average n-tier stack has four or five different systems that can transform Unicode, and any of them can do a couple of sorts of canonicalization

That true, but at the level where p() fits, it's irrelevant. Given an interface to a SQL interpreter that accepts a SQL command as a single string, and a bunch of strings to wedge into such an a SQL command as embedded literals, the only thing that matters is ensuring that those literals comply with the SQL interpreter's rules for serializing literals. These will be well-defined, and something like p() can encapsulate them conveniently.

1) Alternate-' to ': Occasionally, you get other languages that have characters that are "close enough" to a ' to get turned back into a '.

If the SQL interpreter lists these as valid serialization (quote) marks, then p() will need to be coded to deal with them. If it doesn't, it won't.

2) Inefficient-' to ': UTF-8 is variable length. What happens when you get a non-minimal encoded ' in the stream?

Non-minimal UTF-8 encodings are illegal, and what they will do when fed into an SQL interpreter really depends on the interpreter. Specifically, it depends whether the interpreter does a UTF-8 decode on its entire command string before tokenizing it (which strikes me as an idiotic mistake; none of the SQL reserved words or operators require Unicode) or does it on the individual tokens parsed from that string, in which case whatever illegal UTF-8 does to p()-interpolated commands it will also do to bound ones, or doesn't do it at all, in which case the database can just store text values as uninterpreted 8 bit bytes and not care at all that they're actually Unicode characters. In the idiot case, p() can simply treat all the illegal encodings of ' as '. In the other cases, it doesn't have to worry about them.

Now, go ahead and throw in URLEncoding: %20 and the like. And realize you can have nested layers of URLEncoding. And you can do all of the above, all inside the multiple levels of URLEncoding.

All of those considerations apply to parameterized commands as well. Any badly encoded garbage you can wedge into a string and bind to a ? in a parameterized command, I can serialize with a correctly coded p() and interpolate safely into a non-parameterized command.

I agree with you that parameterized commands are the Right Thing, but I'm still not convinced they're the Only Thing.
posted by flabdablet at 10:05 AM on October 21, 2009


As I understand it, the reason the US uses assorted machinery instead of paper ballots is the sheer volume of elections in the US. We don't have anything like that in Australia. We don't elect our police commissioners, or our judges and magistrates, and in any case there aren't 300 million of us.

It would be interesting to see comparative per-capita spending figures on electoral (as opposed to campaign) costs, all the same.
posted by flabdablet at 10:14 AM on October 21, 2009


the reason the US uses assorted machinery instead of paper ballots

This is a less useful way to think about the difference, since over half of voters in the US use paper ballots (that are then scanned).

It would be more useful to think of the difference as "The reason the US uses automated counting instead of hand counting is largely in the sheer number of separate elections and ballot propositions taking place on the same day.
posted by ROU_Xenophobe at 10:27 AM on October 21, 2009


I agree with you that parameterized commands are the Right Thing, but I'm still not convinced they're the Only Thing.

Sure, if you enjoy re-inventing the wheel. Making your own SQL escape functions can be a good exercise while you're learning, but in production you use the tried and true builtins. Half if not more of what's on TheDailyWTF is programmers fucking up while trying to reimplement builtin functionality.
posted by kmz at 10:32 AM on October 21, 2009 [1 favorite]


krinkly--

Paper ballots marked by hand have some logistical issues around multiple languages and handicapped access, and won't be as machine readable. As for open source, you know what's better than an open source voting system? A standard that's so open, it doesn't matter if the code is totally closed, it's still secure.

flab--

Your system is fragile, bottom line. If p() is ever out of sync with the actual database, all hell breaks loose. You can complain all you want that oversimilar characters and overwide characters are unruly or illegal, but do you know what happens if they're entered? No, no you don't.

And what happens in an n-tier system, when a web front end is more than a single tier away from the database, but needs something from it? Either it parameterizes to get into your p() [showing the value of parameterization] or your p() never gets to implement its filter because it doesn't know which bytes are dangerous content and which bytes are actually commands from the remote tier.

Canonicalization and filtering is actually a really tricky thing. When do you do it? Do you do it as close as possible to the database, meaning there are all sorts of layers that are receiving unfiltered data? Or do you do it as soon as data enters the system, meaning you've lost context on how said data will eventually be parsed? Or do you filter multiple times, losing efficiency and debuggability?

I tell you, this stuff is hard enough without string concat.
posted by effugas at 10:32 AM on October 21, 2009 [2 favorites]


Scientific American did an article on tightening security on electronic voting machines (pdf link). One of their recommendations was an audio confirmation of each vote to be retained as the permanent record. I don't know that that's the best solution, but the article has some interesting discussion about the security limitations of the various voting systems.
posted by electroboy at 10:36 AM on October 21, 2009


In a couple of weeks, I'll be voting in a Minneapolis city election. I will report to a nice old lady, show that I am who I am, sign my name in a book, and get a paper voucher thing. I'll hand the voucher thing to the nice old lady at the next table, and get a human-readable paper ballot in exchange. I'll go to a screened foldup little table, fill in some ovals with an ink pen. When I'm done, I'll feed the paper ballot into an optical scanner with a lockbox underneath.

When the day's over, the nice old ladies will double check the scanner readouts and report them to City Hall. If there's any question, the paper ballots still exist and can be recounted.

Unless you're intending to vote for lizard people, the system works pretty goshdarn well.
posted by gimonca at 10:42 AM on October 21, 2009 [2 favorites]


A human readable canonical printed format is the obvious solution. That nobody is seriously building this is amusing.

Some years back IOA did some work on a secure voting system. Whatever happened to that?
posted by scalefree at 11:09 AM on October 21, 2009


Unless you're intending to vote for lizard people, the system works pretty goshdarn well.

Perhaps some changes are in order.
posted by Lizard People at 11:40 AM on October 21, 2009 [11 favorites]


If p() is ever out of sync with the actual database, all hell breaks loose.

True. Easy to fix, but necessary to maintain and therefore sub-optimal.

You can complain all you want that oversimilar characters and overwide characters are unruly or illegal, but do you know what happens if they're entered? No, no you don't.

My point is that you need to find out, whether you're doing so in order to fix up a p() or not, because they may well break parameterized queries in odd ways as well. In fact, depending on the database and the connection glue, parameterized queries might actually be implemented with something very like p() under the hood.

And what happens in an n-tier system, when a web front end is more than a single tier away from the database, but needs something from it?

Then the system ought to be designed so that each tier has a well-defined interface to the next tier to get what it needs. Web front ends issuing SQL commands, or even fragments of SQL, that make their way all the way to the database back end is a pretty major wtf, in my opinion. The layering ought to be better than that.

Either it parameterizes to get into your p() [showing the value of parameterization] or your p() never gets to implement its filter because it doesn't know which bytes are dangerous content and which bytes are actually commands from the remote tier.

In fact it will probably end up doing a hell of a lot of gratuitous reserialization to ship stuff around as XML or JSON or whatever the kids have decided is cool this week, and hopefully all these things will be capable of not smooshing atomic values together into an indecipherable mess before they hit the database layer. I agree that feeding things other than atomic data items through anything like p() defeats its purpose.

Canonicalization and filtering is actually a really tricky thing. When do you do it?

You do it whenever not doing it might make something break.

Do you do it as close as possible to the database, meaning there are all sorts of layers that are receiving unfiltered data?

If you're trying to filter stuff because it could conceivably be used to implement an SQL injection attack, then filtering is the wrong thing to be thinking about. Canonicalization is a separate issue from filtering, and both are separate from parameter passing and/or serialization.

Or do you do it as soon as data enters the system, meaning you've lost context on how said data will eventually be parsed?

Basic sanity checks (password lengths, date validity and other stuff that a user might need to correct interactively) should happen inside the web browser; if necessary, the back end should supply database-sourced rules or scripts to the front end to let that happen. Any outward-facing layer such as a web server that receives data from something potentially beyond the system's control should be ensuring that all its own encoding rules are adhered to (invalid UTF-8 should be checked for by the web server, for example, and rejected - not filtered or fixed).

Business logic should be Unicode-aware internally, and should have trustworthy methods for shipping Unicode around. The business logic should not need to worry about testing for bad UTF-8, because the web server should be designed never to send it any, and those layers should have secure connections so that no malefactor can masquerade as a web server.

Or do you filter multiple times, losing efficiency and debuggability?

Each layer should validate what it cares about. The closer you get to the database, the more abstract the filtering should get. Once you're at the point of issuing SQL commands, you shouldn't be needing to get rid of stuff like bad character encodings, you should be able to work with any data that the business logic can throw at you, and SQL syntactical rules should have no bearing at all on the kinds of things that the business logic can ask you to process. The system as a whole really ought to be able to cope with truly arbitrary values in things like name fields without chucking a spazz.

The most important thing is that the system designer keeps the distinction between encoding validity and content validity in mind when specifying the interfaces between layers, and designs layers to reject ill-formed input rather than trying to fix it up. Doing ad-hoc fixups after the system is already in beta will result in a bad, broken system.

I tell you, this stuff is hard enough without string concat.

It seems to me that a large part of what makes stuff hard is that nobody has the time or inclination to understand how any of it actually works any more. Information hiding is all well and good, and code re-use is fine and dandy, but the Law of Leaky Abstractions is real, and the more information you hide and the more unexamined code you re-use, the leakier your abstractions are going to get.

It seems valuable to me to know how to do p(), and to be able to identify clearly what problems it solves and what problems it causes, before learning about parameterized queries (which I agree address these same problems much more elegantly and robustly). Doing things the Right Way is good; knowing why the Right Way is right is invaluable.

Trying to re-code any moderately complex sh script as a Windows cmd file is a pretty good way to exercise a lot of these thought-ways, too. The difference between an ugly but fundamentally sound design (sh) and a collection of randomly evolved hacks and tweaks (cmd) is stark.
posted by flabdablet at 11:54 AM on October 21, 2009 [3 favorites]


Or do you do it as soon as data enters the system, meaning you've lost context on how said data will eventually be parsed? Or do you filter multiple times, losing efficiency and debuggability

You escape or unescape* at the interfaces between the layers, as per their respective well defined patterns for doing so, or preferably via vendor-supplied escaping functions. You have to be quite meticulous about doing it, but otherwise it is not a hard problem.

(* and if you don't know what type of data - plain text, html, sql, etc - each string in your system represents and are just cluelessly "filtering" at random places, you're completely fucked whatever software design pattern you use)
posted by cillit bang at 12:16 PM on October 21, 2009 [1 favorite]


My point is that you need to find out, whether you're doing so in order to fix up a p() or not, because they may well break parameterized queries in odd ways as well. In fact, depending on the database and the connection glue, parameterized queries might actually be implemented with something very like p() under the hood.

Parameterized queries are the specific way you differentiate data from code to a database. If it's just string concatenation under the hood, it's a pretty bad database.

More importantly, it's a bug, in much the same way a buffer overflow on %%%%%%20 might be. Yes, it's something to be worried about, but it's not exactly your development responsibility in the way p() would be.

And what happens in an n-tier system, when a web front end is more than a single tier away from the database, but needs something from it?

Then the system ought to be designed so that each tier has a well-defined interface to the next tier to get what it needs. Web front ends issuing SQL commands, or even fragments of SQL, that make their way all the way to the database back end is a pretty major wtf, in my opinion. The layering ought to be better than that.

I hereby bust you, by the principal of infinite descent. If a web front end shouldn't be assembling raw SQL, and instead should be parameterizing, then I submit that the very same logic applies to the database itself. As you say, the layering ought to be better than that.


In fact it will probably end up doing a hell of a lot of gratuitous reserialization to ship stuff around as XML or JSON or whatever the kids have decided is cool this week, and hopefully all these things will be capable of not smooshing atomic values together into an indecipherable mess before they hit the database layer. I agree that feeding things other than atomic data items through anything like p() defeats its purpose.


Weirdly, JSON appears much, much cleaner / more secure than XML. XML got all sorts of complexity built onto it (DTD/schemas) while JSON parsers are pretty much all simple state machines.

If you're trying to filter stuff because it could conceivably be used to implement an SQL injection attack, then filtering is the wrong thing to be thinking about. Canonicalization is a separate issue from filtering, and both are separate from parameter passing and/or serialization.

Canonicalization is scrubbing, and scrubbing is the best form of filtering. Whatever came in, convert it to a form in which you know whatever is there, and then filter that. This is basically the only way to really filter content. As I've joked on more than a few jobs, "Never bring a regular expression to a turing complete fight."

Basic sanity checks (password lengths, date validity and other stuff that a user might need to correct interactively) should happen inside the web browser; if necessary, the back end should supply database-sourced rules or scripts to the front end to let that happen.

And now you have lots and lots of logic that assumes the client will filter for it. What happens when the client is malicious?

Each layer should validate what it cares about. The closer you get to the database, the more abstract the filtering should get.

Sort of. The problem is contracts, and the difficulty maintaining them. I've been in some great meetings, where one team has said "Oh, our parent layer filters, we don't need to validate", while the next team says, "Oh, we just pass those bytes down to the next layer, we don't need to validate". Both teams have minimized their work effort, but this was done by neither team actually doing the work.

This is the sort of problem the architects like to handwave away, and yet there it is, over and over again.

It seems valuable to me to know how to do p(), and to be able to identify clearly what problems it solves and what problems it causes, before learning about parameterized queries (which I agree address these same problems much more elegantly and robustly). Doing things the Right Way is good; knowing why the Right Way is right is invaluable.

It's valuable because there are always going to be things you need to build a p() for. However, when there's an entire category of problems you can eliminate by specifically using the database's out of band comm layer, rather than trying to gin one up yourself, you use it, so you have time to implement p() for the stuff that you don't have a prepackaged solution for.

Remember, going out of band was a change we had to do to the phone network.
posted by effugas at 12:59 PM on October 21, 2009 [1 favorite]


Basic sanity checks (password lengths, date validity and other stuff that a user might need to correct interactively) should happen inside the web browser; if necessary, the back end should supply database-sourced rules or scripts to the front end to let that happen.

That's all well & good unless my client is something like Paros Proxy. Then you're screwed.
posted by scalefree at 1:17 PM on October 21, 2009


I've read a lot of papers about proposed secure voting systems. Most have unrealistic constraints: for example, one system relied on a type of write-only memory chip, and was provably secure as long as the bad guy didn't have access to a supply of those chips. What local government has the capability to custom-manufacture memory chips?

But some of the proposed systems are really clever. My favorite is Farnel voting. The link describes an electronic version, but I'll use a paper analogy:
1) Each ballot lists the candidates in a different order, and comes with a numbered piece of carbon paper that records which oval you filled in (but not which candidate you vote for).

2) Then voters trade receipts (maybe by using a leave-one-take-one box in the voting station).

3) The government posts a list of all the filled-in ovals, stripped of candidate names; you can check that the ballot corresponding to the receipt you got was recorded correctly.

4) Finally, the government releases the correspondence between receipt numbers and candidate orders, so anyone can independently count all the votes.

The Farnel system is brilliant because it prevents two major types of attacks on voting systems: you can't buy votes (because voters don't have their own receipts), and you can't change votes (because each receipt is checked). The government could lie about the candidate orders, but it turns out there's an easy way to prevent this (have fun figuring it out!). It's still possible to stuff the ballot box, but that's probably impossible to prevent using technological measures.
posted by miyabo at 1:20 PM on October 21, 2009


Clarification: the "numbered piece of carbon paper" is called a "receipt"
posted by miyabo at 1:23 PM on October 21, 2009


Isn't SQL injection completely impossible as long as I remember to wrap my q() function around any string I interpolate into an SQL statement? Isn't q() equally trivial to code up in any other SQL-generating environment? Is the use of something like q() not absolutely standard practice everywhere? Why is this hard?

One thing I've found is that people will screw up anything and everything you give them the opportunity to. For example, shuffling a list has an extremely easy and straightforward implementation that is provably uniform.

But if you ask 10 programmers to implement a shuffling algorithm that operates on a list of N elements, what you find is a breakdown like this:
Swap each element with some other random element: 6 people
Do N * K (K usually about 20-50) random swaps: 3 people
Something even worse: 1 person
Correct implementations: Usually zero

So, is it (relatively) easy, well-known, and straightforward? Sure. Does that mean people won't fuck it up all the time? Hardly.
posted by 0xFCAF at 1:43 PM on October 21, 2009 [4 favorites]


Late to the party, but all this talk of sql injection is moot. How can you inject sql on a machine with buttons (no way to inject a semi-colon without user text input folks)?

The point is that if you are relying on a hash sum against the executable to *verify the machine has not been tampered with* and are using store procedures, then someone with access to the system can modify the stored procedures, and the executable still passes any vaildation against the hash, even though the system has been compromised.

Thats serious shit right there people.
posted by crunchywelch at 4:01 PM on October 21, 2009


I agree with you that parameterized commands are the Right Thing, but I'm still not convinced they're the Only Thing.
Parameterized commands are not the Only Solution, just the only reliable solution. Any sufficiently robust SQL-scrubbing mechanism ends up re-implementing parameterized commands.

Realistically, SQL injection is probably the least problematic issue faced by physical voting machines: users of the machine have to go through a specific fixed input device -- usually a keyboard or scroll-wheel -- that the manufacturers of the device control. On the web, where SQL Injection is particularly nasty, it's trivial for a remote user to hand-craft an incoming data packet that's specifically engineered to blow past subtly vulnerable defenses. That's a little trickier to do with a scroll wheel.

Crunchywelch's point is the better one: there is, as best as we can tell from analyzing the code, shit for protection against a variety of privileged attacks. Imagine a house with a bunch of locked doors and three wide-open windows: that is analogous to checking the "program file" for tampering but not checking the stored procedures in the database.
posted by verb at 5:15 PM on October 21, 2009


Also: when I say 'we' I mean various talking heads here in this thread and on slashdot, not some specific team of security experts I work with. Also, 'the code' means what is publicly leaked, not the full solution that might exist somewhere out there.

Analysis of a leaked db snapshot, for example, is not a huge, robust security audit. But since it's a protected closed-source application that the public can't look at or examine, the leaked code snippets are in fact the best shot we collectively have at determining whether democracy is still in effect.

There's people talking about Acorn ending democracy by letting a 'Donald Duck' registration slide through -- I think those concerned about voting systems can be excused for taking the systems that actually collect and count the votes themselves very, very seriously.
posted by verb at 5:25 PM on October 21, 2009


And also, as an additional clarification, I looked at the torrent. After reading the wiki, it looks like a pretty breathless "THEY TRIED TO STOP US, BUT WE R HACKERS!" feel to it.

There's enough damning stuff to be found in the voting machine industry without having to go all Hack The Planet.
posted by verb at 6:20 PM on October 21, 2009


Late to the party, but all this talk of sql injection is moot. How can you inject sql on a machine with buttons (no way to inject a semi-colon without user text input folks)?

You're assuming that the voting UI is the only available attack vector you need to worry about. For something as valuable as an election that's just not true. You have voting booth volunteers & supervisors, people who move the machines to the voting locations, voters themselves opening the machine & accessing the non-public hardware directly, tabulation server admins & others with access to them; I could probably add some more but that's a good start. At some point all those write-in votes need to be typed in - do you trust the typist not to slip in a quick ';exec master..xp_cmdshell?
posted by scalefree at 6:22 PM on October 21, 2009


Heh. A lot of it is people loudly announcing they are morons. "What's a .bak file? Oh noes! EVIL CODE!"

Oh look, it's Boing Boing: Voting machine source-code leak shows election-rigging subroutines?

Er, no Cory, it doesn't. And that question mark doesnt make you any less a person who should shut the fuck up and stop pretending they know anything about computers.
posted by Artw at 6:26 PM on October 21, 2009 [1 favorite]


Thats serious shit right there people.

Only if you believe that the machines are authenticated the way say there, which - if true - would be a far more meaty scandal than this data dump nonsense.
posted by cillit bang at 12:47 AM on October 22, 2009


Only if you believe that the machines are authenticated the way say there, which - if true - would be a far more meaty scandal than this data dump nonsense.
Well, that's also the problem with closed-source trade secret code running our elections. I mean, maybe it's powered by the souls of delicious orphans! Who knows?

Does patent law trump the need for democratic elections?
posted by verb at 3:30 PM on October 22, 2009


Does patent law trump the need for democratic elections?

Yes.
posted by scalefree at 8:27 PM on October 22, 2009


Does patent law trump the need for democratic elections?
I don't really think the conversation can proceed productively if the disagreement runs that deep.
posted by verb at 8:35 AM on October 23, 2009


You are simply wrong flabdablet. All non-static SQL statements must be prepared, always. It costs zero programmer time, improves code maintainability, improves database performance, and protects against injection. Ain't no reason for using non-prepared dynamic statements.

Ideally, you should learn how your language and database will handle the prepare statement. A functional language will obviously run each prepare statement exactly once and only once, but otherwise you've the choice between pre-compiling all queries, wrapping all queries in caching hashes, or using your database's query cache.

I think python's iterators provide one handy compromise between maintainability and speed, but you'll need the absurd BEGIN { my $q; sub fun { $q = $db->prepare(...) unless defined $q; ... } } in perl.

Admit it, you'll laugh when management consultants get bit by dormant SQL injection attempts. :)
posted by jeffburdges at 6:28 AM on October 24, 2009


Does patent law trump the need for democratic elections?
Yes.


Oh, it most definitely does.

But it shouldn't.
posted by rokusan at 4:08 AM on October 25, 2009


You are simply wrong flabdablet.

Oh, frequently! If I knew what I were doing, why would I bother? And I did lead off this whole derail by claiming to be an SQL noob.

All non-static SQL statements must be prepared, always.

Are prepared statements any more secure than non-prepared but parameterized ones? If so, why?

It costs zero programmer time, improves code maintainability

Your perl example above would suggest otherwise, but I'll defer to experience.

improves database performance

Won't any decent DB do some kind of query caching under the hood anyway, and won't it get cache hits for repeat executions of non-prepared (but parameterized) queries?

and protects against injection.

Is there any general rule to predict how databases will handle a TEXT parameter containing invalid UTF-8?

Ideally, you should learn how your language and database will handle the prepare statement.

Thanks - I'll do that.
posted by flabdablet at 4:30 PM on October 25, 2009


« Older Invasive Species   |   The Times They are a-changin' Newer »


This thread has been archived and is closed to new comments