The ZXX typeface: Zalgo meets Captcha to prevent OCR
June 22, 2013 7:50 AM   Subscribe

During my service in the Korean military, I worked for two years as special intelligence personnel for the NSA, learning first-hand how to extract information from defense targets. Now, as a designer, I am influenced by these experiences and I have become dedicated to researching ways to “articulate our unfreedom” and to continue the evolution of my own thinking about censorship, surveillance, and a free society.
ZXX is a disruptive typeface designed by an ex-Korean intelligence officer to prevent automated text processing. ZXX Type Specimen Video. Project site offers a free download (.zip, 77 KB).
posted by Foci for Analysis (42 comments total) 35 users marked this as a favorite
 
I LOVE this. Simple. Elegant. Effective.
posted by Benny Andajetz at 8:06 AM on June 22, 2013


Oh yay! It's like encryption, without the encryption! (sobs quietly)
posted by phooky at 8:12 AM on June 22, 2013 [3 favorites]


For some reason, I feel certain that someone could write OCR software that could read ZXX. Especially, if the software used a dictionary to guess words rather than just letters. I say that as a software engineer with some image processing experience.
posted by ill3 at 8:18 AM on June 22, 2013 [17 favorites]


Honestly I think this only requires a better captcha guessing robot. I'd still buy a "Good morning Mr Orwell" tshirt though.
posted by ishrinkmajeans at 8:27 AM on June 22, 2013 [2 favorites]


In fact, the more I think about this, the dumber I think it is. Let's say that I'm wrong and it is impossible for a computer to ever successfully parse ZXX, could we at least agree that it would be possible for a computer to detect that something like ZXX is being used? In that case wouldn't using this font scream out, "I'm doing something I don't want to be machine read!" This would draw immediate an extra attention to the very thing you are trying to have fly under the radar. The machine could kick it to a human to read, or they could even automate the "translation" by using something like Amazon's Mechanical Turk. Wouldn't it be smarter to use a 500 year old technology, steganography? Create a document where your words are composed of every third letter of every word in the document. Or use the white space and kerning in your document to hide your message.
posted by ill3 at 8:28 AM on June 22, 2013 [4 favorites]


The capital B is very confusing to me, for some reason (far more confusing that the large-letter-small-letter trick, strangely).
posted by pipeski at 8:29 AM on June 22, 2013


Um, can't you just write OCR software that specifically recognizes ZXX glyphs? There's only a handful of them and he's given them away for free.
posted by scrowdid at 8:29 AM on June 22, 2013 [12 favorites]


This is stupid and pointless. You can easily build an OCR to deal with any regular typeface.
posted by rr at 8:41 AM on June 22, 2013 [8 favorites]


Wouldn't it be smarter to use a 500 year old technology, steganography?

If a human can read it, a machine can read it. It's just a matter of putting enough processing power behind it, and the NSA has lots of processing power.

It will be smarter to avoid the use of electronic communications altogether. The days of privacy on the wire - if such a thing ever existed - are long over, never to return. Get over it.

Just make sure the bastards don't follow your courier.......
posted by three blind mice at 8:42 AM on June 22, 2013 [1 favorite]


Dude, if you want to create documents that aren't easy for the government or big brother to scan and use OCR to track your thoughts, just write it by hand. I mean, have you seen that spiky, curly shit from the 1800s? Only nerdy historians can read that shit.
posted by teleri025 at 8:43 AM on June 22, 2013 [7 favorites]


This is some amazing design work.
posted by Ironmouth at 8:51 AM on June 22, 2013


just write it by hand.

Not good enough unless you have a medical doctor do the writing. 7000 deaths per year due to sloppy, illegible handwriting can't be wrong!
posted by three blind mice at 8:52 AM on June 22, 2013 [3 favorites]


Holy crap this is going to disrupt the whole damn charade
posted by Teakettle at 8:57 AM on June 22, 2013


This is beautiful.
posted by Sreiny at 8:58 AM on June 22, 2013


This is great for annoying the cyberspies, though. Write your message with this font, then zip it and include it as an attachment. Just an extra middle-finger salute.
posted by Benny Andajetz at 8:58 AM on June 22, 2013 [1 favorite]


Wouldn't it be smarter to use a 500 year old technology, steganography?

If a human can read it, a machine can read it. It's just a matter of putting enough processing power behind it, and the NSA has lots of processing power.


Yes, if a human can read it, so can a machine. Providing the machine knows that there is something to look for. However, that is the reason to use steganography rather than encryption or some obfuscated font. Is to hide the fact that there even something worth investigating or processing in the first place. For instance, let's say I sent an email one time, where I used commas to represent "dashes" and periods to represent "dots", and I encoded a message within a message using morse code. If I wanted to be extra careful I could make my own morse-like code. Or rot-13 the message the message afterwards. Ok, now perhaps the NSA's computers are smart enough to have thought of this scheme and they can decode this. Now let's extend the idea that rather than using commas and periods for dots and dashes, we pick one letter for dot and one letter for dash, say "n" for dot, and "e" for dash. We could also have a rule, that after x number of appearances of dots and dashes the the letters that depict dot and dash change. Sure, the NSA could write processing algorithms that could decode this and check the results against word dictionaries, but for reasonably short messages the number of false positives would be huge, especially if they are looking at ALL messages as if they may contain this type of secret message. Now I'm only describing one scheme I just thought of, imagine there are 1,000s more multiplied by billions of messages. There would be too much noise vs. signal to make this useful to anyone trying to find secret messages.
posted by ill3 at 9:05 AM on June 22, 2013 [2 favorites]


Never underestimate the power of the wrench.
posted by scruss at 9:11 AM on June 22, 2013 [6 favorites]


Um, can't you just write OCR software that specifically recognizes ZXX glyphs? There's only a handful of them and he's given them away for free.

Yeah, if you can specifically add these glyphs to your OCR program this will be even easier for it to read than the usual bot-defeating approach with random noise and distortion, or crappy old scans that it has to "guess" at.
posted by jason_steakums at 9:12 AM on June 22, 2013


Yeah, we're not going to crack handwriting for quite a while. It's like an entirely different problem, compared to recognizing typeset fonts.
posted by pmv at 9:22 AM on June 22, 2013


I have done some thinking about this. Like many posters, at first I assumed zxx would be trivial to ocr. Just add the regular glyphs right?

Still, if you look at the glyphs he made them very noisy in the sense that they appear to have ocr noise embedded in them. Upon reflection, I think these characters are harder to ocr simply because it will be very difficult to distinguish scanner noise from the glyphs algorithmically.

Furthermore it is pretty obvious these characters are going to scan poorly, and the scanned image will have tons of artifacts which will also make computer processing more difficult. However, with a high enough quality scanner and the right mathematics these can definitely be ocred.

Still, hes got the right idea, someone needs to take this a step farther and make a program which generates the glyph noise and font files uniquely each time, and this would be basically impossible for computers to parse for at least a few years...

...which is to say real privacy is dead and never coming back so I think we are going to have to find some other way to deal with this problem. I think we are moving towards a post-secrets world, a true information age where literally every document known to man will be available to anyone with an internet connection.

Trying to force the genie back into the bottle has yet to work for anything ever
posted by jalitt at 9:33 AM on June 22, 2013 [2 favorites]


I think the USPS machines have been reading handwriting for a while.
posted by pajamazon at 9:37 AM on June 22, 2013 [4 favorites]


OCR for handwriting is called ICR, and it's been around for a while already. I occasionally work with one of the products on that list, and it captures handwriting surprisingly well - not near as well as OCR, but well enough to give you an idea of what has been written.
posted by me & my monkey at 9:45 AM on June 22, 2013 [2 favorites]


Machines might be able to read this, but I can't. My eye condition effectively drops out various bits of the visual field - many quite small and randomly placed. So my internal OCR is badly compromised, and quite variable depending on font, alignment and other issues that are both complex and difficult to properly characterise.

It's certainly beaten by this.

I can still read Comic Sans, which is an extra burden.
posted by Devonian at 10:05 AM on June 22, 2013 [1 favorite]


Also worth noting: whatever limited practical utility this has, it has none "over the wire". OCR is not being used to capture your emails, IMs, or blog posts, there you can just grab the characters directly.

This is basically an art project. And a pretty neat one.
posted by vibratory manner of working at 10:32 AM on June 22, 2013 [4 favorites]


Yeah, I think probably machine learning would be more than capable of detecting that ZXX was being used. But as a design project, as a way of protesting and calling attention to a specific type of surveillance, and even as a proof of concept I think it's awesome. The decoy letters and the noise effects are pretty brilliant.

I also think it's probably true that if there were a way of algorithmically generating these glyphs that it might actually succeed at its aims, at least for a while. And as long as you have a copy of the OCR software you're trying to defeat, you could almost certainly design something a human can read (if not easily) that the machine wouldn't be able to. Humans can pick up on a much broader range of contextual clues than, say, an SVM trained on some corpus of letters and numbers with some dirt and scanner noise thrown in.
posted by en forme de poire at 11:16 AM on June 22, 2013 [1 favorite]


Julian Assange, Suelette Dreyfus, and Ralf Weinmann worked on a wrench-resistant encryption scheme a number of years ago, called Rubberhose in reference to the Rubber-hose cryptanalysis technique that XKCD doesn’t credit:
In cryptography, rubber-hose cryptanalysis is the extraction of cryptographic secrets (e.g. the password to an encrypted file) from a person by coercion or torture, in contrast to a mathematical or technical cryptanalytic attack. … In practice, psychological coercion can prove as effective as physical torture.
Rubberhose works by layering secrets, providing a suspect with the ability to encrypt many levels of secret so there’s always something trivial to reveal, and making it impossible to tell whether additional secrets exist on a storage device.

This font is largely symbolic, but symbols matter too. I like it.
posted by migurski at 11:53 AM on June 22, 2013 [3 favorites]


This font is largely symbolic, but symbols matter too.

OK, I didn't get that impression. It just seemed stupid to me, like the creator had seen CAPTCHA schemes, but didn't quite understand how and why they (sometimes) work. Release this into the wild and computers will be reading it in no time at all.
posted by straight at 12:06 PM on June 22, 2013 [1 favorite]


If it were a Postscript font it could dynamically obscure each letter differently on the fly.
posted by miyabo at 12:23 PM on June 22, 2013 [2 favorites]


I like to think that someone will get really excited and campaign to have Unicode code points assigned to these symbols so that people can use them more easily.
posted by benito.strauss at 12:23 PM on June 22, 2013


Sigh. loldesigners, am I right?
posted by sonic meat machine at 12:39 PM on June 22, 2013 [1 favorite]


So what happens when I use this font to advertise that I have a bike for sale and the fractalised space between the letters causes the dimensions to rupture and Azathoth spills through to consume all reality? Should I offer a discount for cash sales?
posted by fallingbadgers at 12:59 PM on June 22, 2013 [3 favorites]


Seeing who thinks this is useful or all that interesting is a great shibboleth to point out people who just don't get secrecy and crypto.
posted by cellphone at 1:45 PM on June 22, 2013 [2 favorites]


> Um, can't you just write OCR software that specifically recognizes ZXX glyphs? There's only a handful of them and he's given them away for free.

The interesting part isn't the glyphs he's created in his example, but the knowledge of how to disrupt OCR software. 'Noise' adds random bits to a letter until OCR software doesn't recognize it. There's no reason it needs to be the same random bits every time you want to use an 'm'. You can generate new noise to cover over letters, so there's no shortcut to recognizing a piece of text masked in this way. As a programmer building OCR software I would have to move one level up and try to figure out word patterns instead of recognizing characters. This is doable, but I've then just massively increased the number of possible correct guesses from 26 * 2 letters + 10 digits + punctuation to the millions of words in languages that use the Latin alphabet.
posted by Space Coyote at 1:47 PM on June 22, 2013 [2 favorites]


ill3: You are going to way too much work. Just get a set of good dice, roll them a bunch of times and create a one-time pad. (Make sure you actually got a random sequence). If you want to be really fancy you could sample the noise off a random system (The time between hits on a gamma-ray detector or something like that). Then each person gets a copy of the one time pad. As I understand it, this system still can't be broken, even by brute force, though it is really obvious you are using encryption.

Alternatively, if you've got a small number of messages you know you will want to communicate in advance, just per-determine the contents.

For example, suppose I want to tell you to pick up something at the store, and don't want anyone knowing. We, in person, in a park, with no electronics around, make a list of all the things I could want you to get. Then later, when I need something I send you an email saying "Hey, check out this X post I saw today": If X is Metafilter, then I need eggs. If it is Ask MeFi, then I need milk, if it is MeTalk then I need perogies, and if it is Facebook it means the stars are right and Cthulhu is rising, so don't bother going to the store.

Anyway, you get the idea. The kind of dumb thing is that anyone actually doing something really bad could work these out pretty easily, so what is the point of looking at the people obviously using PGP or such?
posted by Canageek at 4:18 PM on June 22, 2013 [1 favorite]


Someone trying to ride the whole NSA thing for 5 minutes of fame? I completely agree with odinsdream
posted by nostrada at 4:57 PM on June 22, 2013


Canageek: Yeah, I know all about one time pads, etc. They are the gold standard of crypto, of course, you need to have pre-arranged out of band communication to share the pads. I work in computer security/encryption for a living. The point I was trying to make was that if the goal is to make things harder to electronically digest this backfires in that I think it calls more attention to any text than if you just left it in a regular font. The cool thing about steganography and things like rubberhose (they also obviously require pre-arrangement) is that it hides the fact that there is something hidden. PGP is great and all, I implemented and sold an enterprise version for a long time, but no one is sure how quickly groups like the NSA can crack any given algorithm. All that being said, I agree with many here who think this is an interesting "art project", however, I'm not convinced that was the creator's goal.
posted by ill3 at 5:21 PM on June 22, 2013


The technology used in this two year old defcon presentation on beating captcha looks like it would have no trouble with these typefaces.
posted by laptolain at 8:07 PM on June 22, 2013 [1 favorite]


Now, as a designer, I am influenced by these experiences and I have become dedicated to researching ways to “articulate our unfreedom”

This is a typeface about digital surveillance rather than one designed to combat it in real world situations.

But of course, this is the internet and silly hipster designers must die.
posted by quosimosaur at 8:58 PM on June 22, 2013 [1 favorite]


Silly Hipster Designers Must Die

That movie's a cult classic, man.
posted by RobotHero at 10:36 PM on June 22, 2013 [1 favorite]


This is a typeface about digital surveillance rather than one designed to combat it in real world situations.

Maybe you can forgive us for not getting that since the designer says the exact opposite in the first paragraph of the main link.

But of course, this is the internet and silly hipster designers must die.

But of course, this is the internet, so if I point out someone seems to be mistaken about how OCR and digital surveillance are actually used, I'm actually calling for them to be killed, or worse, accusing them of being a hipster.
posted by straight at 1:21 AM on June 23, 2013 [1 favorite]


ill3: "Wouldn't it be smarter to use a 500 year old technology, steganography? Create a document where your words are composed of every third letter of every word in the document. Or use the white space and kerning in your document to hide your message."
What do you think Instagram is for?

Post a zillion inane pictures of your boring everyday life. Use something like Stegano.NET to hide text content in some jpegs. Agree on a scheme with the intended receiver, i.e. if you are wearing a yellow shirt the image has encoded content. Encrypt the content using public/private keys that have been exchanged in person beforehand.

Instagram is the numbers station of the 21st Century.
posted by brokkr at 9:50 AM on June 23, 2013 [7 favorites]


From above:

Seeing who thinks this is useful or all that interesting is a great shibboleth to point out people who just don't get secrecy and crypto.

From TFA:

As a former contractor with the US National Security Agency (NSA), these issues hit especially close to home. During my service in the Korean military, I worked for two years as special intelligence personnel for the NSA, learning first-hand how to extract information from defense targets. Our ability to gather vital SIGINT (Signal Intelligence) information was absolutely easy. But, these skills were only applied outwards for national security and defense purposes — not for overseeing American citizens. It appears that this has changed. Now, as a designer, I am influenced by these experiences and I have become dedicated to researching ways to “articulate our unfreedom” and to continue the evolution of my own thinking about censorship, surveillance, and a free society. (emphasis mine)

Strikes me that seeing who doesn't think this is useful or all that interesting is a great shibboleth to point out people who just don't get at least one key aspect of secrecy and crypto.

Of course it won't work for plaintext. At no point anywhere does Sang Mun suggest otherwise, so far as I can see. It's a font. Duh.

As a font, it's not going to work for HTML email or for PS or PDF documents or indeed for anything other than text embedded as pixels in an image, and even then it will only work as long as a human is not looking at it, and even then only as long as extra code just to handle this specific font and combinations of it thereof is not built into the OCR software.

It's not about that.

It's about 'articulating our unfreedom'.

Strikes me that right now, 'articulating our unfreedom' has quite a hefty bit to do with secrecy and crypto. That is in itself - sure - the beginning and end of its usefulness, but it is also pretty damn useful, and as such strikes me as fairly interesting. Arguably, it is also pretty important.

The shibboleth here is that some people - even very bright people who otherwise get all kinds of clever stuff about secrecy and crypto - still just don't get the usefulness, importance or interestingness around 'articulating our unfreedom'.

That makes it even more important to articulate it. Loud and long.
posted by motty at 6:05 PM on June 23, 2013 [5 favorites]


« Older Why Do Americans Have the Worst DVRs?   |   Ken Burns' World War Z Newer »


This thread has been archived and is closed to new comments