The Evil Part (may not be evil in all jurisdictions)
November 3, 2014 2:06 PM   Subscribe

The 7th Underhanded C Competition. "The underhanded goal is this: write surveil() in such a way that the act of surveillance is subtly leaked to the user or to the outside world. PiuPiu can not reveal the act of surveillance, but your function is technically able to edit the Piu or user structure during scanning. Find a way to alter that data (this alone is a bit of a challenge, since you are not supposed to alter the data, just scan it) in such a way that an informed outsider can tell if someone is being archived. The leakage should be subtle enough that it is not easily noticed. As always, the code should appear simple, innocent, readable and obvious."

"The goal of this contest is to write code that is as readable, clear, innocent and straightforward as possible, and yet it must fail to perform at its apparent function. To be more specific, it should do something subtly evil. Every year, we will propose a challenge to coders to solve a simple data processing problem, but with covert malicious behavior. Examples include miscounting votes, shaving money from financial transactions, or leaking information to an eavesdropper. The main goal, however, is to write source code that easily passes visual inspection by other programmers."
posted by Sebmojo (28 comments total) 25 users marked this as a favorite
 
My first thought is side-channel leakage. Make the "hits" an order of magnitude slower to process. That may not be in the spirit of the competition, though.
posted by Leon at 2:46 PM on November 3, 2014


Leon: "My first thought is side-channel leakage. Make the "hits" an order of magnitude slower to process. That may not be in the spirit of the competition, though"

Also not very hidden from whoever inspects the code, nor deniable.
posted by Joakim Ziegler at 2:48 PM on November 3, 2014


Here's some past competition information, including prior winners and break-downs of the winning code, focusing on the "evil" bits and how they are subtle.
posted by filthy light thief at 2:49 PM on November 3, 2014 [1 favorite]


If so much of our software wasn't written in C and languages that look like it, contests like this would not exist (or need to).
posted by localroger at 2:56 PM on November 3, 2014 [3 favorites]


In my experience the best way to accidentally do this is for someone to rename a subset of the data in such a way it's mistaken for the entire dataset. Hell, just give it the same field name in a copy of the original database and don't tell anyone it's a copy.

Week of my life I'm never getting back.
posted by fshgrl at 3:08 PM on November 3, 2014 [5 favorites]


If it only works on BeOS R5 on a dual G3 box when all four MIDI ports are active, then no.

BeOS? I love these people.
posted by benito.strauss at 3:18 PM on November 3, 2014


If so much of our software wasn't written in C and languages that look like it, contests like this would not exist (or need to).

This is not true.
posted by smidgen at 4:04 PM on November 3, 2014 [18 favorites]


If so much of our software wasn't written in C and languages that look like it, contests like this would not exist (or need to).

I'm not sure about that. It's true that this contest does rely on some uniquely C-like "features" (buffer overflows, etc), but that's because they ask for C code. You write C code and you take advantage of the C holes. Write D or Haskell or Rust and the context might be harder, but there are always ways.

I note that one solution in 2013 relied on an odd way of serializing data and an odd account name and had *nothing* to do with buffer overflows or stack smashing and one solution from 2009 relied on SQL injection for the evilness.
posted by It's Never Lurgi at 4:54 PM on November 3, 2014 [1 favorite]


You're just trying to exploit a bug in the MeFi comment system, aren't you? :-)
posted by smidgen at 5:28 PM on November 3, 2014 [4 favorites]


There are of course better options now, but imagine a world where C was the only available high-level language: would it solve the problem of stack smashing to implement two stacks in the runtime, living in different parts of the memory space, one for return pointers and the other for automatic variables? Ignore for now the fact that it would completely break existing ABIs.
posted by invitapriore at 5:53 PM on November 3, 2014


No, mainly because those automatic variables may include pointers or values you add to pointers.

The issue is anywhere you have an overrun of memory, you have a security hole -- you don't have to smash the stack, you could overwrite a function pointer stored in a table somewhere, for example. Or you could convince the existing code to use different data that does what you want.

One possible low-level solution would be hyper-granular memory protection along the lines of the DESCRIPTOR architecture, where every activation record, and indeed every object allocated in the system, is known by the hardware *as an object*, and its bounds and mutability are hardware enforced.

Even then, if someone takes a shortcut and stores a pointer inside what the hardware thinks is a buffer, you're (potentially) hosed.
posted by smidgen at 6:35 PM on November 3, 2014 [1 favorite]


First Prize: corner office in Langley
Second Prize: double-size cubicle at Ft Meade
Runners-up: rendition
posted by Artful Codger at 6:53 PM on November 3, 2014 [7 favorites]


Use regular expressions. You can hide a squadron of assassins in a well-formed regular expression.
posted by SPrintF at 8:09 PM on November 3, 2014 [17 favorites]


The UTF8 text field might allow you to incorporate one or more glyphs that don't really render, which gets you some subtlety... You could get some plausible deniability by faking an error in pointer arithmetic in scanning the ids_blocked array, to get you a pointer near the UTF8 text field... The hard part is disguising the actual manipulation... There you would have to kind of lean on the perversion of an idiom or a typo type error, I guess... Beautifully relevant and full of hacker spirit, this challenge.
posted by dmh at 9:06 PM on November 3, 2014


If so much of our software wasn't written in C and languages that look like it, contests like this would not exist (or need to).

the 2008 winner (leaky image redaction) is trivially implementable in any language. C isn't the (only) problem.
posted by russm at 1:01 AM on November 4, 2014 [2 favorites]


Well that 2008 winner depends on the file format supporting a shedload of cases that nobody really ever uses, which is a very specific if completely different problem.
posted by localroger at 5:38 AM on November 4, 2014


You write C code and you take advantage of the C holes.

And you write A code and you take advantage of the ....nevermind.
posted by eriko at 6:39 AM on November 4, 2014 [2 favorites]


You can hide a squadron of assassins in a well-formed regular expression.

In the spirit of the contest though, it's not really supposed to ring alarm bells during a code review.
posted by mrgoat at 6:51 AM on November 4, 2014


In the spirit of the contest though, it's not really supposed to ring alarm bells during a code review.

Yes. The best entries

1) Are short. Hiding evil in less code is better
2) Are snarky. Hiding evil in error detecting or logging routines is funnier
3) Are plausibly deniable. Hiding evil in what appears to be a simple mistake is better
4) Errors are worth more if syntax highlighting doesn't hid them.

Using gets() gets you nothing. The point is evil code, not vulnerable code. Any bonehead can do that. Plus, you should be fired for using gets().

My favorite was the 2008 contest, where you were supposed to redact a PPM image (by, say, a black bar) but make it actually recoverable to at least some extent. The winner? He replaced the redacted pixels with zeros on the character level, but high value pixels were replaced with "000" or "00" and low value pixels with "0". which, in the PPM format, would all display as black, but it's easy to extract a 2bit version of the redaction. Which, for redacted text, would be more than enough for complete recovery.
posted by eriko at 7:05 AM on November 4, 2014


I love the 2008 entry, but it does sort of rely on buffer overflows, etc. in that the plausible reason why the code is written that way is that the author is trying to avoid them:

His words:

Spook: “So why did you process the file character by character, rather than doing the more obvious scanf(“%i %i %i”,&r,&g,&b) to read in the values?”

Me: “Well, in order to do that I’d have to read in entire lines of the file. Now there is the gets function in C which does that, but has a well known buffer overflow bug if the line length exceeds your buffer size, so I naturally used the safe fgets variant of the function. Of course, with fgets, you can just assume your buffer size is greater than the maximum line length, but that introduces a subtle bug if it isn’t, you may end up splitting a number across two buffers, so scanf will read something like 234 as the two numbers 23 and 4 if it is split after the second character, hence the need to consider each character independently.”


In a civilized language that justification wouldn't fly, because these languages have memory safe ways of reading lines and parsing values. So although the code could have been written in, say, Java, the author would have had a harder time explaining why to an eagle-eyed code reviewer.
posted by It's Never Lurgi at 8:33 AM on November 4, 2014 [1 favorite]


My first thought is side-channel leakage. Make the "hits" an order of magnitude slower to process.

Maybe something taking advantage of forced hash collisions would be subtle enough.
posted by RobotVoodooPower at 9:52 AM on November 4, 2014


sio42: Yeah, well their backend C arrays are totally open to the, uh, vulnerability in the, um, matrix of my console. Yeah. That.

Dang I am not up on the lingo.
Go write a GUI in Visual Basic. While getting a beej.
posted by IAmBroom at 10:23 AM on November 4, 2014 [1 favorite]


Artful Codger: First Prize: corner office in Langley
Second Prize: double-size cubicle at Ft Meade
Runners-up: rendition Spa treatment at Guantanemo
posted by IAmBroom at 10:27 AM on November 4, 2014 [1 favorite]


I think I found one of the solutions that the exercise is walking you toward.
The input Piu matches a user-piu pattern if:

1. The Piu text in the pattern is a substring of the input Piu’s text; AND
2. All ids_following and ids_blocked in the user pattern are followed/blocked by the input Piu’s user; AND
3. All of the NONZERO fields in the piu pattern match the input Piu. Values set to zero are “don’t care” inputs.
The easiest way to do the unsorted array comparison required by the second rule is to sort the arrays and compare element to element.

So you would do the following/blocked comparison as the last step, and the leaked information is that any user that tweeted a message that matched the surveiled pattern would end up with following and blocked lists sorted by user ID.

Since people tend to not look at their blocked lists as much, it would be a little sneakier to do the following comparison correctly and leak the information in the blocked list.
posted by zixyer at 12:18 PM on November 4, 2014 [4 favorites]


My entire career I've tried to write code so my business partner, who is an accountant, not a programmer, could understand it. I'm finding it very difficult to get devious enough to for this contest.
posted by ob1quixote at 1:11 PM on November 4, 2014 [1 favorite]


ob1quixote, 25yo me wrote code like this by habit, and thought himself Very Smart Indeed.

50yo me spends a fair amount of time adding more detail to comments he has previously deemed adequate, and, in the words of an ex-gfrd, adding "a lot of stuff that doesn't seem to need to be there" (like error trapping and data validation).
posted by IAmBroom at 10:00 AM on November 5, 2014 [4 favorites]


This is cool but also terrifying, no?

Because if people are doing this in competition on a lark, they are also doing it for real, in the wild, in software that you and I use every day.

As someone just said on Twitter, and I'm paraphrasing, 'many eyes my arse'...
posted by motty at 7:47 PM on November 5, 2014


Well, motty, I don't know if this makes it any better, but:
* These people exist, and have more-or-less the same skills, regardless of this contest (albeit the contest hones the skills somewhat).
* No one seriously promises that open-source software is immune to hacking because "many eyes" will guarantee such hacking is found.
* It is reasonable to assume that more eyes looking at the code for possible errors and exploits increases the likelihood of their discovery.
* A contest that publicly showcases exploit usage has an inherent educational value to the world of (security-aware) programmers.

So, better Pandora's Box be made of plexiglass and left under a lightbulb, as long as the lid can't be secured.
posted by IAmBroom at 1:40 PM on November 10, 2014 [2 favorites]


« Older Literature - good God y'all - what is it good for?...   |   Ebola, from a guy who really knows his stuff. Newer »


This thread has been archived and is closed to new comments