Join 3,438 readers in helping fund MetaFilter (Hide)


Next step, X-ray specs!
May 13, 2004 12:35 AM   Subscribe

Opacity no match for technology! A CS grad student comes up with a technique for restoring words that have been blacked out in classified documents.
posted by nomis (12 comments total)

 
What I find interesting is that this technique still boils down to picking the best candidate from a potential bunch of solutions. Surely that limits the applicability?

For example, the text gives a test case in which "Egyptian" is picked as the best fit, but I can't help thinking that "unofficial" would be just as good a fit.

And what if the censored portions were people's names? They wouldn't be in an electronic dictionary.
posted by nomis at 12:36 AM on May 13, 2004


And what if the censored portions were people's names? They wouldn't be in an electronic dictionary.

It's not much more computing power to add in a list of names and even to just list all possibilities and then filter out never-used letter combinations to produce phoenetically possible words.

I find it funniest that it only works now that the state department mandated a shift from a monospace font to a proportional font. You'd think someone in charge should have thought of this.
posted by Space Coyote at 12:42 AM on May 13, 2004


I see an easy solution to this. If a classified document needs to be released, all the blacklist candidates should be normalized to, say, 24 characters, by prefixing and suffixing some standard strings.
posted by Gyan at 1:15 AM on May 13, 2004


There was a way to do this through a flaw in PDF documents some time ago.
posted by angry modem at 4:43 AM on May 13, 2004


The flaw in PDF docs was because they were just overlaying a big black square on top of whatever Acrobat was supposed to display. People just removed the black block from the PDF and poof.

This involves actually reversing a fully blocked out section, based on its width. It's really quite brilliantly done -- and as Space Coyote points out, it's made a thousand times easier with proportional fonts -- given characters of 5, 3.3, 4, 2.2, 7.1, and 4.6mm, there's only one set of characters that can yield a blot out of 16.1mm, and very few relevant words that can be spelled with those characters. Wow. We should have known.

To be fair, I noticed the original announcement that State was changing its fonts, and I didn't even imagine that there might be a security implication in that. Definite "obvious in retrospect".

Regarding blacklisting, changing spacing of a document is enormously tricky -- adjust something on one page, and everything else repaginates.
posted by effugas at 5:00 AM on May 13, 2004


But when you blot out a word manually with a marker pen, surely you don't block out precisely the width that the characters take up. Isn't the black mark likely to be slightly longer than the actual word blocked out -- and a random amount longer, at that? How does this technique take account of that?
posted by chrismear at 5:14 AM on May 13, 2004


everything on the same line of text should be equally spaced, chrismear - measure the distance between the end of the preceeding word and the start of the following word, subtract two lots of space, and there you have the exact length. although of course this fails if you marker pen over two or more consecutive words. then it gets tricky.
posted by nylon at 5:28 AM on May 13, 2004


Much of the redacting I've seen is long strings of words, very commonly entire paragraphs. This technique is of limited use.

However, this brings up a point that's bothered me since around 9/11: Shouldn't there be a cabinet-level IT department in the U.S. Government? Not that I'd want the job, but it seems like there's all kinds of efficiency, cost, and, especially, national security advantages to the governement using up-to-date, interoperable technology.

Every time I see evidence of the tech savvy of the U.S. government, compared to that of the typical 14-year old, I wince. Redacted classified info with sharpies is just the tip of the iceberg.
posted by luser at 6:21 AM on May 13, 2004


If the researchers were in the US, they'd probably be answering questions from a judge, right about now.
posted by signal at 6:33 AM on May 13, 2004


When I worked as a bookbinder / preservationist at BU Mugar Libraries, we would sometimes receive books for the African Studies Library from South Africa or Rhodesia (this was the '70s) that had blacked out passages, using thickened india ink. Almost impossible to remove. We tried all kinds of solvents. Occasionally we succeeded in getting a bit of the ink off. Usually it was a reference to a "banned" person, Steven Biko in one case we worked on.

It's interesting that right-wing governments would black out passages in published books, while in communist countries the solution was much easier - they simply banned the book and jailed the writers (at least here in Hungary.)

However, this brings up a point that's bothered me since around 9/11: Shouldn't there be a cabinet-level IT department in the U.S. Government?

There should, but the present administration is so politically partisan they wouldn't allow it. Americans need security in times of threat. Halliburton doesn't. Bad for business.
posted by zaelic at 7:07 AM on May 13, 2004


"[...] it seems like there's all kinds of efficiency, cost, and, especially, national security advantages to the governement using up-to-date, interoperable technology."

Which is precisely the reason they should not have a cabinet-level IT department, of course.
posted by spazzm at 10:04 AM on May 13, 2004


All I can say is version one of the software folks. Attach it to a grid, cross reference a new dictionary as well, and voila, instant best-guess decryption. Delicious!
posted by fluffycreature at 6:20 PM on May 13, 2004


« Older Google To Start Selling Banner Adverts...  |  Ad Aspera Per Astra... Newer »


This thread has been archived and is closed to new comments