The mystery of the duqu framework.
March 9, 2012 3:18 PM   Subscribe

The Kaspersky analysts over at Securelist uncovered some interesting things deep in the bowels of the code of a trojan. The hooks of the trojan are written using standard, well known languages and interfaces (C++, DLLs and such), but the payload, upon analysis, seems to be written using some heretofore unknown programming language. Can you figure out what language the Duqu trojan is written in? (via Lambda the Ultimate Programming Blog)
posted by symbioid (94 comments total) 26 users marked this as a favorite
 
I'll go for the longshot and say LOLCODE. If it actually turns out to be that, remember, you heard it here first.
posted by radwolf76 at 3:28 PM on March 9, 2012 [7 favorites]


I was really hoping it was going to be written in FORTRAN so I can finally DO something with that class I took at community college during Reagan's first term.
posted by birdherder at 3:30 PM on March 9, 2012 [17 favorites]


To be picky, I don't think you can "figure out" what language it was written in, but instead you would "recognize" it.
posted by benito.strauss at 3:30 PM on March 9, 2012 [1 favorite]


According to page 2, it is likely a custom object-oriented C framework. http://www.securelist.com/en/blog/667/The_Mystery_of_the_Duqu_Framework?page=2#comments

I was really hoping it was going to be written in FORTRAN so I can finally DO something with that class I took at community college during Reagan's first term.

If you had a serious interest in FORTRAN you could do some things with it. The language is still being advanced and is used all over the world.
posted by michaelh at 3:32 PM on March 9, 2012


LLVM? I'm no expert, but it smells like that kind of alternate universe thing.
posted by poe at 3:35 PM on March 9, 2012


To be picky, I don't think you can "figure out" what language it was written in, but instead you would "recognize" it.

You can rule some things out, sort of, like the way that they're pretty sure there are destructors that manage tearing the objects memory down. That's something that wouldn't appear in a garbage collection language (lisp, java, etc). The weirdness of how "this" is passed around rules out some well defined OO languages, like C++. The lack of API might rule out Delphi and so forth. You try to eliminate posibilities and see what you have left.
posted by RustyBrooks at 3:41 PM on March 9, 2012 [1 favorite]


I think you could "figure it out" by looking for signatures from the code generation from well-known compilers - if you didn't find them, you could probably more or less prove you weren't using that compiler.

But I don't know why there's any controversy over what language it is. Given that it's not a standard language, you have to think about the people who write these things. They don't have formal training in computer science so they aren't going to be using Scala or some academic language - additionally, they're always thinking about how to inject their code into someone else's machine language at an instruction level.

Why is it so difficult to imagine that it is just written in assembler? How could it be anything else?

Modern assemblers have a pretty serious macro capacity - if you had a huge library of assembler macros written by a diverse group of bright, undisciplined people I imagine the resulting programs would look very much like this.

On preview, the custom object-oriented C framework sounds plausible - but that does assume that no one's done their homework. I believe that you'd be able to detect most modern C compilers' output by their characteristic use of registers and code generation, so you'd know it was coming from C initially.

Custom C object-oriented frameworks have a long history - you usually do them with macros, too, but C macros - and if this is the solution then there isn't much "mystery" at all!
posted by lupus_yonderboy at 3:42 PM on March 9, 2012 [8 favorites]


Duqu got its name from the prefix "~DQ" it gives to the names of files it creates.

DQ...? Duqu...?
Dooku. Count Dooku.

Oh my god.

This is George Lucas' revenge for Red Tails bombing at the box office, isn't it?!
posted by Foci for Analysis at 3:42 PM on March 9, 2012


"That's something that wouldn't appear in a garbage collection language (lisp, java, etc)"

It certainly COULD. I just learned that in java, NIO buffers are allocated in direct memory, and cleaned up via finalizer, and that running out of direct memory does not cause garbage collection. Thanks, java.
posted by flaterik at 3:42 PM on March 9, 2012 [5 favorites]


(Oh, and I should add, to blithely say, "detect the compiler by looking at registers and code generation" hides many hours of hard work... I'm not saying these guys are slack!)
posted by lupus_yonderboy at 3:43 PM on March 9, 2012


On preview, the custom object-oriented C framework sounds plausible - but that does assume that no one's done their homework. I believe that you'd be able to detect most modern C compilers' output by their characteristic use of registers and code generation, so you'd know it was coming from C initially.

In the comments, they said that they always suspected custom OO C, but were hoping to find evidence of something that would have taken less developer time.
posted by michaelh at 3:44 PM on March 9, 2012


LAL sorry friends this w'snot suppost to slip into your Terra_
posted by 2bucksplus at 3:46 PM on March 9, 2012 [3 favorites]


> I just learned that in java, NIO buffers are allocated in direct memory, and cleaned up via finalizer,

And of course, since Java makes no guarantees that finalizers are ever called at all, this is a potential issue. Yes, I was shocked to find this out too, but Effective Java spells out the dire consequences of this at great length...

But we can eliminate any possibility of any language with garbage collection in it anywhere. There is no garbage collector in this code.
posted by lupus_yonderboy at 3:46 PM on March 9, 2012


In the comments, they said that they always suspected custom OO C, but were hoping to find evidence of something that would have taken less developer time.

Which seems funny because certainly there are private implementations floating around and there's no reason to believe that they didn't snag some weird one off the shelf, or even borrow one from someone they knew or that they'd developed in the past. There's no reason to believe they made one from scratch.

And yeah, it's not IMPOSSIBLE for a gc language to have destructors but it's not standard either.

One question I'm curious about is whether they believe that the code is *deliberately* obscure.
posted by RustyBrooks at 3:47 PM on March 9, 2012 [1 favorite]


Does the CIA have an internally used sekrit programming language? No?

Failing that, I'mma gonna guess Malbolge.
posted by kaibutsu at 3:47 PM on March 9, 2012


Don't people write new languages all the time? Why is this super-shocking?
posted by Cool Papa Bell at 3:50 PM on March 9, 2012


Why is it so difficult to imagine that it is just written in assembler? How could it be anything else?

But it's pretty unusual to think in terms of objects and methods when working in assembly. Obviously, you can do anything in assembly you can do in any other language, but if you're going to be working in abstractions of such a well-known type, why would you roll your own instead of using someone else's?

In other words, writing assembly code that looks like compiler output would be pretty silly, when you could just use a compiler instead.
posted by Malor at 3:51 PM on March 9, 2012 [6 favorites]


> In the comments, they said that they always suspected custom OO C, but were hoping to find evidence of something that would have taken less developer time.

Hah!

There's nothing you can do in C++ you can't do it C with a lot of macros. Yes, it's harder to write, yes, it's harder to debug, but I have to say that if I knew C well and C++ not at all and needed to do an OO thing in a rush, it'd be a LOT faster to use the macro system than to learn C++.

It really isn't that hard at all, and there are definitely some advantages to it over straight C++ OO stuff - for example, you can do "mix-ins" where you have "some methods from class A and some from class B". I would personally never do this in 2012, there's so much I'd miss from C++ that solving the double-dispatch problem would not be worth it, but I don't think it's an obvious non-starter by any means.

mpg123, for example, is a very solid, modern library for reading mp3s, which I am a user of, and it employs more or less this technique.
posted by lupus_yonderboy at 3:53 PM on March 9, 2012


Must be intercal.
posted by MartinWisse at 3:54 PM on March 9, 2012


Clearly it was written in Objective C.
posted by Obscure Reference at 3:56 PM on March 9, 2012 [2 favorites]


> In other words, writing assembly code that looks like compiler output would be pretty silly, when you could just use a compiler instead.

My assumption from the initial conditions presented was that it did NOT look like compiler output - that this had been fairly thoroughly checked. If that is not true, then the assembly macros are less of a possibility - I'd go with the custom OO framework in C (though again, this is hardly "mysterious" given that numerous production libraries do just this).

I don't actually think it'd be so much more work as you think - if you had a large macro library and a lot of experience with the system. We don't know how long this code has been hanging around, do we?

But yes, given that we don't know for sure that this isn't the output of some C compiler, the C OO framework is the logical choice.
posted by lupus_yonderboy at 3:57 PM on March 9, 2012


They don't have formal training in computer science

Why would you assume that?
posted by empath at 3:58 PM on March 9, 2012 [9 favorites]


Yeah after reading everything I would guess either an obscure existing OO extension to C or a custom OO extension to C they wrote themselves. It's not that shocking that you could build simple classes in C, and if you were used to OO languages and had to write C code without any external libraries your code would probably look a lot like that code.

And yeah, it's not IMPOSSIBLE for a gc language to have destructors but it's not standard either.

I think if they had found evidence of there being a garbage collection system in the code they probably would have mentioned it. You can add destructors to a gc language but you can't generally take the gc out of a gc language.
posted by burnmp3s at 3:58 PM on March 9, 2012 [1 favorite]


> Clearly it was written in Objective C.

I'm almost sure that's not the case, because of the nonsense with the putative "this" pointer described in the original article.
posted by lupus_yonderboy at 3:58 PM on March 9, 2012


If it's a custom OO framework, it's unlikely that it was written for this one job. It was probably on-shelf when the bid came in to build the payload.

It can be surprisingly worthwhile to build something like this for yourself if it allows you to use a familiar tool (MSVC) in better ways. Say you want OO for coordination on a team project and code reuse, but you want to use MSVC because it's bulletproof, you know it, and it generates much leaner and familiar code than C++.

Also remember that if this is custom, it doesn't have to be generally bulletproof, only bulletproof to the extent of the author's needs. I can see something like this being built in a couple of months that more than pay for themselves over the course of two or three projects.
posted by localroger at 3:59 PM on March 9, 2012


> > They don't have formal training in computer science

> Why would you assume that?

Well, none of the people I've met who did this sort of thing did. And if you did, you'd probably use C++ - it's the logical language for writing low-level attacks.

Generally, people who have formal training in computer science and are any good end up learning a lot about something specific, like graphics or error correction code, and never look back. Even in Eastern Europe, you can make a good living that way.

It could be that governments are hiring people to write these, and they come with credentials, but if so I'd expect to see something that was obviously the output of gcc.

But... you can't be sure.
posted by lupus_yonderboy at 4:04 PM on March 9, 2012 [1 favorite]


So - is there any particular reason to use a custom code? I could imagine some sort of obfuscation, but heuristics is one thing that would reduce that approach. Add in that it still uses the MSVC hooks, and it would eliminate any "benefit" to using a custom language?

Maybe it's just someone who likes to hack on languages and as a sideproject did this thing up to see if they could, with the language they wrote?
posted by symbioid at 4:06 PM on March 9, 2012


> it generates much leaner and familiar code than C++.

Your modern C++ compiler with optimization turned on can write much leaner code than almost any human ever could, even Jeff Dean.
posted by lupus_yonderboy at 4:07 PM on March 9, 2012 [1 favorite]


COBOL with a Visual Basic wrapper. Well when I say wrapper I mean more like stapled together.
posted by fallingbadgers at 4:08 PM on March 9, 2012 [1 favorite]


Delphi
posted by Ad hominem at 4:08 PM on March 9, 2012


BTW, Delpi can use like half a dozen different calling conventions so it can pass via stack or register.
posted by Ad hominem at 4:10 PM on March 9, 2012


Lex V.V
posted by moonbiter at 4:13 PM on March 9, 2012


Well, none of the people I've met who did this sort of thing did. And if you did, you'd probably use C++ - it's the logical language for writing low-level attacks.

Hackers can use some fancy-pants languages. Scala is not the right tool for the job for low-level stuff since it compiles to bytecode, but plenty of malware is written in other less common languages than C++. This guy wrote most of his malware in Scheme. and I've heard of OCaml as being generally thought of as a hacker language.
posted by burnmp3s at 4:15 PM on March 9, 2012


Hmm, looks like lupus_yonderboy might win this round of the internets- from the comment thread over at the link, from the author:

Igor Soumenkov
2012 Mar 10, 00:11
Re: Re: Other C/C++ compiler?

igorsk, thanks for the hint. It turns out that almost the same code can be produced by the MSVC compiler for a "hand-made" C class. This means that a custom OO C framework is the most probable answer to our question.
We kept this (OO C) version as a "worst-case" explanation - because that would mean that the amout of time and effort invested in development of the Framework is enormous compared to other languages/toolkits.
--

I don't actually know crap about any of this, but it's fascinating, and reminds me of old slashdot-lurking days, so yay!
posted by hap_hazard at 4:17 PM on March 9, 2012 [2 favorites]


yeah. delphi is OO without GC and objects use a manual Dispose . Supports register, pascal, cdecl, stdcall, and safecall calling conventions.

I am convinced until someone figures comes up with something better
posted by Ad hominem at 4:18 PM on March 9, 2012


What I'm saying, lupus_yonderboy, is that it looks like compiler output because of the way it's thinking, in terms of objects and methods. That's an extremely unusual model to create yourself in assembly. You can, but when so very many compilers have been debugged and battle-tested, why would you?

That's an assload of extra typing (or copying and pasting) for just about zero gain. Why go through all that pain when a compiler can just do it for you? So presuming that this is, in fact, compiler output of some type seems most reasonable. It's just a matter of determining which compiler.
posted by Malor at 4:18 PM on March 9, 2012


Your modern C++ compiler with optimization turned on can write much leaner code than almost any human ever could

Well maybe it can but that doesn't mean that it will, especially if you're not quite sure what you're doing with it.

True story: I have spent thousands of hours working with a particular niche embedded device which works very well, but has always been abysmally slow for its function and chipset. I have seen the source code (over the developer's shoulder) and yes, it's C++. I eventually resorted to siccing the disassembler on it to try and insert a hook for an unavailable function.

Inside i found the following operation being done repeatedly:
  1. Load size of object
  2. Multiply by object index
  3. Add to object base
  4. Add index of data within object
  5. Load or save data
  6. Load size of SAME object
  7. Multiply by SAME object index AGAIN
  8. Add to SAME object base AGAIN
  9. Add index of different data within object
  10. Load or save next data item
...on an 80186, where MUL is a rather time consuming instruction. After I alerted the manufacturer to this they sheepishly told me a few years later that they found a compiler switch that fixed that.

So, if you don't know about that compiler switch, and you do know how to tell your C compiler what to do, the C compiler definitely will generate leaner code.
posted by localroger at 4:20 PM on March 9, 2012


There is plenty of OO c code. QT is a prime example.
posted by Ad hominem at 4:20 PM on March 9, 2012


>> Clearly it was written in Objective C.

> I'm almost sure that's not the case, because of the nonsense with the putative "this"
> pointer described in the original article.

I thought that was a "feature" of the decompiler they used. Also, what kind of hacker doesn't know to strip the symbols from their code?
posted by Obscure Reference at 4:24 PM on March 9, 2012


Turtlegraphics
posted by jfuller at 4:24 PM on March 9, 2012 [4 favorites]


Does the CIA have an internally used sekrit programming language?

No, you're thinking of the NSA.
posted by ceribus peribus at 4:27 PM on March 9, 2012 [1 favorite]


Be that as it may, no C compiler will ever beat Mel.
posted by fifthrider at 4:41 PM on March 9, 2012 [4 favorites]


As a veteran Mac programmer I immediiately thought Object Pascal.
On Windows, and in this century, that means Delphi.
posted by w0mbat at 4:55 PM on March 9, 2012 [2 favorites]


based on my "expert analysys" of the asm, it is not handwritten asm.

Use of the somewhat obscure TEST opcode. It is actually an opcode meant to do a number of operations as an atomic group. In this case they are using it right before a jump on zero (JZ) so they are only checking the zero flag(ZF) so they are using TEST to simply perform a bitwise AND. The labels are obviously autogenerated and they are all short JMPs, so they are all within 127 bytes, nobody wants to deal with that shit.

If it is not OO style C it is Delphi, and me and w0mbat are correct.
posted by Ad hominem at 5:02 PM on March 9, 2012 [3 favorites]


Unless Embarcadero have rewritten Borland's Delphi compiler, this doesn't look like Delphi assembly code to me. Besides, any mainstream compiler/language have to be excluded because they all produce idiosyncratic assembly code that is easily identifiable, especially by people who reverse-engineer assembly code day in, day out.
posted by surrendering monkey at 5:09 PM on March 9, 2012


This stuff is like one level of expertise beyond me. I know enough to understand what's being said but not participate. That said, fifthrider's link to the story of Mel is pretty awesome. I'm going to have to find excuses to start using "most pessimum."
posted by JHarris at 5:11 PM on March 9, 2012


As a veteran Mac programmer I immediiately thought Object Pascal. On Windows, and in this century, that means Delphi.

Not necessarily.
posted by JHarris at 5:13 PM on March 9, 2012


They don't have formal training in computer science ...written by a diverse group of bright, undisciplined people

Citizen - how do YOU know about a lack of formal computer science training or the level of discipline in this case?

Come over here to the DHS office to speak further on this matter if you please.
posted by rough ashlar at 5:26 PM on March 9, 2012


Maybe it's just someone who likes to hack on languages

Or someone's underloved Masters or PhD project.
posted by rough ashlar at 5:30 PM on March 9, 2012 [1 favorite]


On Windows, and in this century, that means Delphi

Is it possible to use Delphi without linking to any libraries? They rolled their own Windows API calls and whatnot. I didn't think Delphi was that lightweight.
posted by burnmp3s at 5:59 PM on March 9, 2012


Ain't exactly lolcode but ..

data LULZ a b c = WIN a | FAIL b | MOAR c deriving (Eq, Read, Show)

I've given isMOAR, fails, wins, etc. all the obvious definitions, but MOAR c takes the monad structure. And it's a comonad too, not that anybody cares about those.
posted by jeffburdges at 6:04 PM on March 9, 2012


Well, none of the people I've met who did this sort of thing did.

Yeah, but Stuxnet and Duqu are two of the most advanced malware programs ever written and were almost certainly created by state actors. This isn't a spam bot.
posted by empath at 6:08 PM on March 9, 2012 [1 favorite]


The labels are obviously autogenerated

I assumed the labels were produced by the disassembler that the Kaspersky Lab guys ran on it. If so, the labels can't tell us anything about how the code was written.
posted by stebulus at 6:25 PM on March 9, 2012 [2 favorites]


And it's a comonad too, not that anybody cares about those.

Edward Kmett resents that.
posted by kenko at 6:26 PM on March 9, 2012


Just because there wasn't any gc code in it doesn't mean that it wasn't written in a language that happens to offer gc (e.g. D).
posted by a snickering nuthatch at 6:31 PM on March 9, 2012




Member functions can be referenced by the object’s function table (like “virtual” functions in C++) or they can be called directly. In most object-oriented languages, member functions receive the “this” parameter that references the instance of the object, and there is a calling convention that defines the location of the parameter – either in a register, or in stack. This is not the case for the Duqu Framework classes – they can receive “this” parameter in any register or in stack.
I'm unqualified to judge, but this seems like the oddest characteristic of all to me. Even if you were rolling your own OO framework, you'd most likely pass the "this" parameter in a consistent stack position (probably first), and the compiler's calling convention would dump it in the same register or in the same stack position for every call. I can't imagine why you'd want it to work another way.
posted by Western Infidels at 7:00 PM on March 9, 2012


Haskell?
posted by A dead Quaker at 7:03 PM on March 9, 2012 [1 favorite]


Oh wouldn't it be funny if it was Ada.
posted by iamabot at 7:25 PM on March 9, 2012 [2 favorites]


With Delphi, you could do direct API calls, strip out many or all of the usual libraries, and thus compile a very lean application. (*) I did this a few times for some very small specialized applications. At least as of Delphi 5, this was possible. I haven't used any later versions in the decade+ since then.

But I can't imagine this is Delphi in this use case. Delphi's not nearly as popular as it used to be (and that's a shame) but its compiler output should be well known.

(*) not that it was bloated with the usual libraries left in. Delphi used to be fairly efficient. But at the tail end of the 90's, I had a couple of projects which the customer wanted to distribute via floppy, thus every byte was important.
posted by honestcoyote at 7:25 PM on March 9, 2012


Considering the resources the Russians and the Chinese put into their APT infrastructure, and considering that neither country uses a latin alphabet as the primary form of writing, it's most likely something homegrown for the purpose of writing malware. I'd search for cyrillic or chinese unicode strings in the executable to see which. Oh, Farsi, Vietnamese and Hangul, too.
posted by Slap*Happy at 7:39 PM on March 9, 2012


Actually, on second thought, I doubt they'd be that sloppy. I'd search for clues of Russian or Mandarin (or Farsi or Vietnamese or Korean) grammar in the code.
posted by Slap*Happy at 7:46 PM on March 9, 2012


Even if you were rolling your own OO framework, you'd most likely pass the "this" parameter in a consistent stack position (probably first), and the compiler's calling convention would dump it in the same register or in the same stack position for every call. I can't imagine why you'd want it to work another way.

To me this is exactly why it feels homegrown. There was probably a progression of techniques, from the easy to the more complex but versatile, and there were instances where the new didn't get adopted because it wasn't necessary. Consistency is only important when you are rolling your code out for others to use. This kind of inconsistency is the kind of thing you tolerate because it does what you need and you personally know where it will break if you push it.
posted by localroger at 7:48 PM on March 9, 2012 [1 favorite]


From the descriptions of object and class layouts, this sounds like something more serious than a simple C OO framework, since it would seem to need to involve modifying the compiler as opposed to bolting on a preprocessor or what have you. (Given that, I'm surprised the payload isn't more readily identifiable as LLVM output or some such.) I'd be curious to see an analysis of the optimization of the generated machine code, since my intuition would be that "home-grown compiler not using existing, mature, open compiler infrastructure" would also equate to "compiler development time spent on optimisation far less than that spent on correctness / validity."

Nonetheless, for some reason I kept thinking of non-OO C preprocessor C-Refine while I read this.
posted by whir at 7:56 PM on March 9, 2012


Considering the resources the Russians and the Chinese put into their APT infrastructure, and considering that neither country uses a latin alphabet as the primary form of writing, it's most likely something homegrown for the purpose of writing malware. I'd search for cyrillic or chinese unicode strings in the executable to see which. Oh, Farsi, Vietnamese and Hangul, too.

We know to a high degree of certainty that Stuxnet was specifically targeted at the Natanz centrifuges in Iran. The numbers in the live code line up to specifics of known configurations at Natanz that are just impossible to be accidental. And we know DuQu shares a common code base with Stuxnet. This is not just malware supported by a state at arm's length, it's state-designed & built with the support of intelligence & large scale funding for a PLC lab environment as a test bed. Israel & the US are the only candidates with anything close to a good fit for motivation to put these programs together & put them to use. I'm certain some of the NSA's next-gen whiz kids had a hand in it. It strikes me as likely that this framework was originally the result of an earlier project, perhaps a general purpose offensive cyber toolkit, that got repurposed for the Stuxnet/DuQu projects.
posted by scalefree at 8:23 PM on March 9, 2012 [1 favorite]


I posted it to Stack Overflow. Should have an answer for you soon.
posted by mattoxic at 8:31 PM on March 9, 2012 [5 favorites]


We're pretty certain Stuxnet was Dutch, Danish or German (lexical cues)... The US and Israel depend on hardware hacks. Great against air-defense, not so good at pwning more sophisticated stuff.

This... This is weird. It's someone's research language. Start going thru eastern block comp-sci papers.
posted by Slap*Happy at 9:10 PM on March 9, 2012 [2 favorites]


I assumed the labels were produced by the disassembler

yeah I would have assumed that too but some of the labels are things like class2_ctor which a dissasembler couldn't have come up with. Maybe a human helped it.
posted by Ad hominem at 9:11 PM on March 9, 2012


and the compiler's calling convention would dump it in the same register or in the same stack position for every call

My take is that a very smart compiler did this as an optimization. Remember that stdcall and fastcall and every other calling convention were invented at a time when borland, microsoft and everyone else were competing to wring every bit of performance out of compiled binaries. I don't think a person could keep track if every call was slightly different.
posted by Ad hominem at 9:18 PM on March 9, 2012


By which I mean a compiler optimized each call passing by stack or register.
posted by Ad hominem at 9:25 PM on March 9, 2012


From the comments thread on securelist, it sounds like the frontrunner theory is an OO idiom on top of plain C. I'm not sure if that can explain all of the calling-convention variation, but maybe the compiler does clever interprocedural optimization.

However, given that we are living in the future, I'm going to place my chips on the two most likely scenarios:

1. Code generated by slicing modules out of the mind of an AI that has been trapped in some military research dept for twenty years

2. Alien compiler technology
posted by hattifattener at 11:12 PM on March 9, 2012 [5 favorites]


My guess is Logo, with the turtle serving as the viral vector.
posted by taro sato at 11:12 PM on March 9, 2012 [5 favorites]


Finally! It's the Year of SmallTalk!
posted by PenDevil at 11:29 PM on March 9, 2012 [5 favorites]


We might need this language for an Independence Day scenario
posted by moorooka at 1:15 AM on March 10, 2012


And we know DuQu shares a common code base with Stuxnet.

Just like the former USSR's Buran has a silhouette uncannily similar to NASA's space shuttles. Designs that have been proven in the field tend to get copied, even more so in programming. Doesn't mean the copier is necessarily the developer of the original, though they very well could be.
posted by radwolf76 at 1:24 AM on March 10, 2012 [1 favorite]


2. Alien compiler technology

Undoubtably written by a cable TV tech on a Mac laptop.
posted by localroger at 5:33 AM on March 10, 2012 [2 favorites]


Somewhere, in a Tempest secured room, a few highly cleared individuals are sipping coffee and laughing out loud at these threads.
posted by Argyle at 6:02 AM on March 10, 2012 [1 favorite]


Somewhere, in a Tempest secured room, a few highly cleared individuals are sipping coffee and laughing out loud at these threads.

You realize that no one has Internet access in a classified facility, right?
posted by LightStruk at 6:19 AM on March 10, 2012


Am I too late to suggest Hypercard?
posted by Sutekh at 6:34 AM on March 10, 2012 [1 favorite]


You realize that no one has Internet access in a classified facility, right?

They could still be sitting in the SCIF laughing about what they read while off-duty.
posted by scalefree at 6:58 AM on March 10, 2012 [1 favorite]


C is just a bunch of assembler macros anyway...
posted by Devonian at 7:43 AM on March 10, 2012 [2 favorites]


yeah I would have assumed that too but some of the labels are things like class2_ctor which a dissasembler couldn't have come up with. Maybe a human helped it.

I haven't worked with a dissassembler in a while, but it's pretty common to be able to rename the variables in the dissassembly output to names that make more sense once you figure out what the code is doing. A lot of reverse engineering is about gradually figuring out what various parts are doing and during that process you have to take a lot of notes.
posted by burnmp3s at 7:56 AM on March 10, 2012


but it's pretty common to be able to rename the variables in the dissassembly output to names that make more sense once you figure out what the code is doing

This is exactly so. the freeware disassembler I use allows sections of code to be marked as to how they are to be disassembled, the labels for jumps and data access to be renamed, comments inserted, and everything wrapped up in a nice little bow and saved as a single database file.
posted by localroger at 12:39 PM on March 10, 2012


Compiled domain specific languages are not out of the ordinary. But this does look like a custom OO framework sitting on top of something like C.

Using embedded function tables is pretty classic OO in C as it's both faster, and more flexible than a seperate vtable. You save a cache miss, and you can write objects that can change their own local method handlers. I've written game code that does this, and I can think of at least one example in our current code base.

The differing locations for passing the object address makes me suspect that it's not rigourously defined by the compiler. Although there may be some bizarre parameter ordering rule in action. If it's C then the parameter locations, register or stack, should follow some sort of pattern, but if the object address parameter is added manually, it could be in any position, or overflowed to the stack. It's possible that there's a mechanical obsfucation layer that's shuffling parameter location randomly. It may even be a heavily optimising compiler that's free to change register locations based on usage.

I suspect it's not mechanical though, but manual, and it's the result of having more than one developer, with individual writing styles. This is not at all uncommon amongst high-power programmers who are more focussed on getting shit done, than conforming to style-guides intended to coddle the weak.

This also makes me think it's not C, but a macro assembler, where you're hand defining the calling structure. In which case it could be just one developer working over a long period of time, changing their style over the years.

Think about it. Some three letter organisation has deep in it's bowels, the guy who writes these things. He's old school, he's a specialist, he knows what the matrix looks like. He doesn't write the transport, he leaves that to the kids. He writes the payload. He has a set of custom tools he's been refining for a long time now. He's a pro.

I don't have a punchline.
posted by inpHilltr8r at 12:38 PM on March 11, 2012 [3 favorites]


He's a pro.

I don't have a punchline.


He's a pro is the punchline.
posted by localroger at 2:28 PM on March 11, 2012 [2 favorites]


There's a lot of "What?" but not much "Why?"

What if the language used was actually mandated by the internal group or organization that created this virus. Assuming it was a state-level intelligence agency, you'd assume they have reasons beyond "Bob wrote his own compiler! Awesome, let's use it!" I suppose if this group has been working on viruses since the early 90s, having their own toolkit they use would make sense from an institutional knowledge level.

But some other possibilities exist. The first possibility is that they wanted to obfuscate the tools and source as much as possible, to prevent tracing back to the source. So a custom toolchain was the best way of doing that. But the uniqueness of this also makes it easier to identify future attacks by this group.

I'm also instantly reminded of Ken Thompson's Reflections onn Trusting Trust. Maybe some governments have knowledge that popular compiler toolchains have been compromised, and this knowledge has been formalized in specific rules and regulations about what tools can be used internally.
posted by formless at 10:56 AM on March 12, 2012


If you had a serious interest in FORTRAN you could do some things with it. The language is still being advanced and is used all over the world.

michaelh, to what degree is the usage dictated by a serious interest in FORTRAN? That is, is there any reason on this green earth to choose to code in modern FORTRAN, instead of coopting existing code and doing further development in later-gen languages, aside from the programmers are FORTRAN-ingrained?
posted by IAmBroom at 11:34 AM on March 12, 2012


> In the comments, they said that they always suspected custom OO C, but were hoping to find evidence of something that would have taken less developer time.

Hah!

There's nothing you can do in C++ you can't do it C with a lot of macros. Yes, it's harder to write, yes, it's harder to debug, but I have to say that if I knew C well and C++ not at all and needed to do an OO thing in a rush, it'd be a LOT faster to use the macro system than to learn C++.


lupus_yonderboy, I can say the same of Assembler versus C and C++. I don't see how "they could have done it; it just would have taken longer unless they didn't know C++" refutes the argument you quoted.
posted by IAmBroom at 11:36 AM on March 12, 2012


If you are doing engineering problems with complex numbers or matrices, FORTRAN beats just about anything else because it has native types and intrinsic functions for a lot of that stuff, plus a bunch of wicked libraries that were all well debugged when computers still ran on steam and Hollerith cards.

I know of a manufacturer that rolled out a product with some complex damping and positional adjustment algorithm that they pretty much copied from an advanced ME book without quite understanding how it worked. Since they were using C++ they had to hand-code all the matrix operations, and deep in the bowels of a low-level routine they missed a minus sign. End result, first delivery of USD$500,000 worth of product to a major manufacturer would not take a field calibration. Took them three weeks to find the bug.

That sort of thing doesn't happen to engineers who use FORTRAN.
posted by localroger at 12:59 PM on March 12, 2012 [1 favorite]


I've occasionally read that FORTRAN complers still produce better compiled code for that kind of heavy numerical stuff than C compilers do, though I don't know if that's stil true today.
posted by hattifattener at 7:56 PM on March 12, 2012


I've read that because FORTRAN lacks some of the fancy language features that C does, compiler writers can do many optimizations/parallelizations automatically, without having to do sophisticated analysis to check that they aren't forbidden in a particular piece of code.

Non-explicit aliasing of variables is the area I heard of; there may be more.
posted by benito.strauss at 9:52 AM on March 13, 2012


localroger, that makes sense, although nowadays most engineers I've worked with use MatLab/MathCAD/Mathematica, and export (usually to C), for complex math manips.

benito.strauss: Non-explicit aliasing of variables does seem to be relevant.

Learned something new... Still, I'll bet 90% of the FORTRAN programming is done because the white-haired or half-bald engineer only knows FORTRAN and PASCAL and Assembler.

/gray-haired and balding engineer. But I can write VBA/C/C++!
posted by IAmBroom at 10:15 AM on March 13, 2012


I know of people using Fortran in their PhD work in the last ten years (for numerical solutions of differential equations arising in fluid dynamics).

The first possibility is that they wanted to obfuscate the tools and source as much as possible, to prevent tracing back to the source. So a custom toolchain was the best way of doing that. But the uniqueness of this also makes it easier to identify future attacks by this group.

I don't quite get this. When you say they wanted to "prevent tracing back to the source", do you mean they wanted to prevent people from finding the authors of the malware? If so, a custom toolchain — or any other highly unusual tools or methods — seems like exactly the wrong thing to do. Better to be superbland, to write the kind of code that anybody could and would write, using popular tools that everybody has access to.

(Relevant Holmes quote: "Singularity is almost invariably a clue. The more featureless and commonplace a crime is, the more difficult it is to bring it home." And I think someone on Mefi recently mentioned that whatsername from Dragon Tattoo bought Ikea furniture when she went into hiding, exactly for this reason.)
posted by stebulus at 6:58 AM on March 15, 2012


Solved.
posted by a snickering nuthatch at 9:44 AM on March 19, 2012 [6 favorites]


« Older Thanks, Housing Collapse!   |   The person who did this to you is broken. Not you. Newer »


This thread has been archived and is closed to new comments