Join 3,564 readers in helping fund MetaFilter (Hide)


Knock, knock. Who's there? Banana. Banana who?
March 18, 2013 8:02 AM   Subscribe

"While playing around with the Nmap Scripting Engine (NSE) we discovered an amazing number of open embedded devices on the Internet. " After completing the scan of roughly one hundred thousand IP addresses, we realized the number of insecure devices must be at least one hundred thousand. Starting with one device and assuming a scan speed of ten IP addresses per second, it should find the next open device within one hour. The scan rate would be doubled if we deployed a scanner to the newly found device. After doubling the scan rate in this way about 16.5 times, all unprotected devices would be found; this would take only 16.5 hours. Additionally, with one hundred thousand devices scanning at ten probes per second we would have a distributed port scanner to port scan the entire IPv4 Internet within one hour.

Don't miss the browsable Hilbert map.
posted by jquinby (63 comments total) 37 users marked this as a favorite

 
And then Skynet goes online. Thanks a lot, guys.
posted by mek at 8:06 AM on March 18, 2013 [2 favorites]


After completing the scan of roughly one hundred thousand IP addresses, we realized the number of insecure devices must be at least one hundred thousand.

So, that's all of them?

(I assume that's a typo -- still, funny & apt.)
posted by chavenet at 8:07 AM on March 18, 2013


Yeah, I was just about to say the same thing. I guess they analyzed 100k IPs, and based on estimates from that (I would guess a few dozen positives?) they could make a whole-internet estimate of 100k open devices?
posted by mathowie at 8:11 AM on March 18, 2013


if you sample, you can extrapolate.

But I think, it looks like they "spread" - scan, try trivial login, upload scanner, rinse-repeat. Coverage of the whole IPV4 address space is interesting. Amazed there's that many open devices (because it implies that many folks have extra addresses just lying around, rather than using internal IPs).
posted by k5.user at 8:14 AM on March 18, 2013


There are some pretty interesting figures towards the end of the paper. This one, in particular, jumped out at me:

Port 9100 Tcp, ~244 Thousand IP addresses
~200 Thousand identifiable printers


Who in the hell puts their printer on the Internet?
posted by jquinby at 8:18 AM on March 18, 2013


Who in the heck embeds an IRC server? (4K Dancer ircds, per article)

Rather amazing feat, this IPv4 scan. The author has to report anonymously... else it's Gitmo for 'em.
posted by Jubal Kessler at 8:21 AM on March 18, 2013


Who in the hell puts their printer on the Internet?

Personally I need to be able to print my emails so I can cut out the URLs. That way it's much easier to retype them into Google.
posted by oulipian at 8:24 AM on March 18, 2013 [20 favorites]


Who in the heck embeds an IRC server? (4K Dancer ircds, per article)

People who run botnets
posted by motorcycles are jets at 8:24 AM on March 18, 2013 [6 favorites]


Who in the hell puts their printer on the Internet?

My department (at a university) alone has 4 printers on the internet...makes them _so_ much easier to use in a group setting. So I'd guess, businesses, university departments, etc. (Especially the latter since they have so much IP address space, there's no downside.)
posted by advil at 8:25 AM on March 18, 2013 [3 favorites]


Don't miss the browsable Hilbert map.

Hoo boy! Am I glad I didn't miss that! Why, that's the most browsable Hilbert map I've ever seen. Or, um, I mean, they made a Hilbert map, browsable!? Well dog my cats and call me Tim Berners-Lee! Or, no, wait...they mapped the browsable Hilberts!? Hol-eee, moley!

O.K., I have absolutely no idea what the hell that is supposed to be showing me.
posted by yoink at 8:25 AM on March 18, 2013 [9 favorites]


This report should be printed to all the printers left open on the internet.
posted by sammyo at 8:27 AM on March 18, 2013 [22 favorites]


"As a rule of thumb, if you believe that 'nobody would connect that to the Internet, really nobody', there are at least 1000 people who did. Whenever you think 'that shouldn't be on the Internet but will probably be found a few times' it's there a few hundred thousand times. Like half a million printers, or a Million Webcams, or devices that have root as a root password."
posted by jjwiseman at 8:28 AM on March 18, 2013 [4 favorites]


From what I can tell this is the actual map of an idea from xkcd in 2006?
posted by mek at 8:29 AM on March 18, 2013 [1 favorite]


O.K., I have absolutely no idea what the hell that is supposed to be showing me.

I assumed it was like this browsable Q*bert map, but with more than one hill.
posted by oulipian at 8:29 AM on March 18, 2013


(Especially the latter since they have so much IP address space, there's no downside.)

Other than the obvious one of exposing an embedded device to the public Internet. What happens if there's a remotely exploitable vulnerability in the printer's firmware? How easy is it to detect that it's been compromised? How easy is it to repair? Does the manufacturer even offer that kind of support?

I think I'd sleep better if it was at least behind a VPN.
posted by RonButNotStupid at 8:33 AM on March 18, 2013 [1 favorite]


What happens if there's a remotely exploitable vulnerability in the printer's firmware?

That is exactly their point. Also, there are tons of services -- enabled by default! -- on this printers that don't need a sploit to work. Just hit the IP address and click on the handy-dandy web interface! It's horrifying to me.
posted by wenestvedt at 8:37 AM on March 18, 2013 [1 favorite]


What about terminal concentrators that run old BusyBox versions? Connect to one of those and root it, and you can get to a lights-out port on all the machines it serves. Eeeek!
posted by wenestvedt at 8:39 AM on March 18, 2013


What about terminal concentrators that run old BusyBox versions? Connect to one of those and root it, and you can get to a lights-out port on all the machines it serves. Eeeek!

Find them all with the click of a mouse!
posted by jquinby at 8:51 AM on March 18, 2013 [1 favorite]


...I mean, like this.
posted by jquinby at 8:56 AM on March 18, 2013



What about the growing number of embedded devices that receive commands and updates from The Cloud by reverse-connecting to the manufacturer such as new Linksys routers and the Belkin WeMo?
posted by RonButNotStupid at 8:57 AM on March 18, 2013


Who in the hell puts their printer on the Internet?

modem---------->unsecured home router---------->networked home printer
posted by Thorzdad at 8:59 AM on March 18, 2013 [1 favorite]


This report should be printed to all the printers left open on the internet.

Better that than those companies who spam FAX machines with ads about cheap vacations and unwanted professional business services.
posted by filthy light thief at 9:00 AM on March 18, 2013 [1 favorite]


Who in the hell puts their printer on the Internet?

It's not like these security mishaps are all intentional, you know.
posted by odinsdream at 9:03 AM on March 18, 2013 [1 favorite]


A Hilbert map means that any sequence of consecutive IPs will form an unbroken line that snakes through the space. An entire IP range will form a continuous region on the map which contains addresses from inside the range and none from outside of it.

Basically it's just a good way of making a 2 dimensional 'map' of IP addresses (which don't have a natural map) that matches out intuitions of how that kind of map should work.
posted by atrazine at 9:06 AM on March 18, 2013 [1 favorite]


The problem isn't what hackers can do to the remote printer (shut it down, make it do funny things). The problem is they can install a worm and form a bot net of 1 million printers. Those 1 million printers can then send out 56k bit streams of packets all at once, spread out between Amazon, CNN and a few other big sites. That would route the traffic through the major backbones and peering points, thus effectively shutting down large portions of the Internet. IOW, the Internet goes dark and people could potentially die. It's really only a matter of time. Conficker could have done this, but the bot master never employed it for anything truely nefarious (other than some pedestrian spamming). Yet, they might still, Conficker is still out there, seemingly idle, waiting for a command.
posted by stbalbach at 9:11 AM on March 18, 2013 [4 favorites]


IOW, the Internet goes dark and people could potentially die.

This summer... PRINTERGEDDON.

Out of paper. Out of toner. Out of time.
posted by dephlogisticated at 9:45 AM on March 18, 2013 [57 favorites]


Who in the hell puts their printer on the Internet?

I had a security engineer brag about his design which put a significant chunk of his network out in the open on the internet - since IPv6 is coming in 5-10 years, the distinction between inside and outside networks was going away.

There are actually paid professionals who belive this. I guess we need to tell our building architect not to put in a driveway, since we'll all have flying cars in 5-10 years.

In the meantime, put that shit on an RFC 1918 network, wouldya? Defense in depth, kids... the firewall can't do it all, no matter what the vendor tells you.
posted by Slap*Happy at 9:50 AM on March 18, 2013 [1 favorite]


The point is unfortunate - there just aren't enough people who understand what they are doing to handle pedestrian tasks like "putting a printer on the network" so it's sloughed off on people who are simply happy if they get it working at all.

And no blame accrues to them. I assume they're mostly poorly paid secretaries, gofers and office assistants who are doing this - I'll bet in most of the cases they aren't even aware there's a possible issue.

It's not like a car. When you get in a car, you are very aware of what failure is like. Scraping the mailbox, minor failure; running over the cat, major failure; colliding with another car at high speed, critical failure. It's very intuitive and no one says, "I had no idea hitting that car was a bad thing to do."

But if you set up the printer so it's sitting there on the net, virginal with its ports wide open, there is no negative feedback at all - indeed, lots of positive feedback, the printer beeps and says, "Connected," and when you try to print, it indeed prints. Your boss tells you, "Good," and you have no idea that you've done the wrong thing - it's likely the printer might print hundreds of thousands of pages and eventually be decommissioned and no one will ever realize that there was a problem at all.

The fault is entirely and 100% with the manufacturers. When the first started putting consumer devices on the Internet, they should have resolved this issue before they started. At the very least, some sort of passwording or fingerprinting should have been mandatory - it should have basically been impossible to set up an open printer or other device without having to click through several threatening messages: "Your printer will now be exposed to the Internet and anyone anywhere in the world could print whatever they liked. Is this really what you wanted (moron)?"

And I'm sure I know why - because the technical people pointed this out and the pointy-haired bosses said, "We can't make money from that - that's not a sexy feature."

And even that is bullshit - the truth is that the PHBs simply didn't understand the issue enough to realize what a selling point it could be.

Picture the following ad... You see the "Brand X" printer start to print really fast (perhaps making a bit too much noise) and a motherly-looking secretary walks over and picks up a piece of paper to see what it is... REACTION SHOT of complete horror (i.e. grandma sees 2 girls 1 cup, link SFW if you can suppress your laughter). And then the tag, "Secure printing(tm) - only at [...]"

People want to do a good job - but even more, they want to avoid critical, job-terminating fuckups. This would have been such a good strategy, and all the other companies would have had to automatically secure their printers just to compete.
posted by lupus_yonderboy at 9:53 AM on March 18, 2013 [11 favorites]


> I had a security engineer brag about his design which put a significant chunk of his network out in the open on the internet - since IPv6 is coming in 5-10 years, the distinction between inside and outside networks was going away.

Bah!

Now, if your security were properly designed, everything could be naked on the Internet without any security risk but you'd still put it behind a firewall because these things are cheap and useful for other purposes and the cost of a serious break-in can be huge.

This is an engineering idea called belt and suspenders, where you have redundant protections, any one of which should suffice to get the job done. (And this even applies to debugging, our previous technical thread here. It's a good idea to try to fix any serious bug you encounter "twice" - fix the immediate bug, but then attempt to fix the overall system to make creating such bugs impossible in the future - and any serious engineering firm will have a post-mortem after any failure to do exactly that.)
posted by lupus_yonderboy at 9:58 AM on March 18, 2013 [1 favorite]


I don't understand why any manufacturer would ship a router or STB which accepts any kind of login on the WAN Interface by default anyway. The consumer is not going to control it from that side. It'd be nice to have a name-and-shame program that identified the manufacturers who do shit like this.
posted by George_Spiggott at 9:59 AM on March 18, 2013


the truth is that the PHBs simply didn't understand the issue enough to realize what a selling point it could be.

Except, as you said yourself, "it's likely the printer might print hundreds of thousands of pages and eventually be decommissioned and no one will ever realize that there was a problem at all."

The problem isn't pornographic images spewing from the world's printers, it's that they're vulnerable to malware that is engineered to stay unobtrusive and not trigger an immune response from the host while it's busy damaging the ecosystem (sending spam, etc.).

A fix for a problem that doesn't seem to be hurting you may not be so easy to sell. But if it's too important to let market forces work out, then it should be a legislative requirement or an ethical obligation or at least a point of pride--"We're shipping a device that is difficult to put on the open internet because it helps the internet."
posted by jjwiseman at 10:12 AM on March 18, 2013


A worse problem with printers is that it's an easy way to get inside the network. If you get on the printer, you can do tons of attacks that are only possible when you're on the same local network as the victim.
posted by ymgve at 10:33 AM on March 18, 2013 [2 favorites]


oulipian: "O.K., I have absolutely no idea what the hell that is supposed to be showing me.

I assumed it was like this browsable Q*bert map, but with more than one hill.
"

@!#?@!
posted by chavenet at 10:46 AM on March 18, 2013 [3 favorites]


It was just a bad pun (Hilbert map / Q*bert map)
posted by oulipian at 11:07 AM on March 18, 2013


The point is unfortunate - there just aren't enough people who understand what they are doing to handle pedestrian tasks like "putting a printer on the network" so it's sloughed off on people who are simply happy if they get it working at all.

The other big problem is that people who might have some clue often lack useful perspective. More than once I've encountered someone who was willing to leave a service open to the Internet because they didn't think anyone would "find" it. For them, their presumed attacker is a guy sitting in an abandoned warehouse somewhere repeatedly trying to guess passwords on a green phosphor display, not a distributed portscanner that probes every single publicly accessible IP address in the world.

At the very least, some sort of passwording or fingerprinting should have been mandatory

Or public-private keys. Lots of devices have USB ports now. At first power-up (or after a configuration reset) you should be able to connect a USB drive to receive a newly generated keypair that's required to access any remote administration features. People like tangible things, and requiring that a physical "key" be plugged into the computer shouldn't be that difficult to get across.
posted by RonButNotStupid at 11:27 AM on March 18, 2013 [1 favorite]


jquinby: "Who in the hell puts their printer on the Internet?"

People who installed a printer in 1998 and needed a remote system to be able to print to it? I know it seems bizarre in this day and age, but oftentimes equipment is not reconfigured for decades after being put into service.

Slap*Happy: "In the meantime, put that shit on an RFC 1918 network, wouldya? Defense in depth, kids... the firewall can't do it all, no matter what the vendor tells you."

RFC1918 gives you a false sense of security. Many (most at one time, probably down to many now) routers have bugs that allow a remote attacker to connect to hosts in the 1918 space. And that's not even considering UPnP as a way to open ports remotely in a lot of consumer-grade routers with broken firmware. Using private addressing provides basically zero security. The firewall built in to the router, on the other hand, provides said security by not allowing incoming connections. It makes no difference how the internal network is addressed.

As a rule of thumb for people who have no clue about networking it's good enough advice now that most routers sold retail can be secured, but it completely mis-states where the security benefit is coming from for the sake of ease of understanding by laypeople.
posted by wierdo at 11:44 AM on March 18, 2013


Who in the hell puts their printer on the Internet?

People whose networks aren't in private ranges, and for whom putting a printer on the network is therefore synonymous with putting it on the Internet.
posted by one more dead town's last parade at 11:57 AM on March 18, 2013


Alright, now I'm feeling intensely paranoid... which is probably a good thing.

I have my home printer - er, I mean, I know a real dope who put his home printer open on his wifi, so both he and his roommate (and any guests) who had access to the wifi could print at will.

Is it safe to assume that the printer is not accessible from outside the private wifi network? WPA2, PSK, and I trust my friend's encoding password to be complex "enough".
posted by IAmBroom at 12:02 PM on March 18, 2013


Re: Hilbert map - ICMP ping and open ports do not inherently mean "security risk".
posted by pashdown at 12:17 PM on March 18, 2013


I liked the bit at the end:

You may ask yourself who we are and why we did what we did.

In reality, we is me. I chose we as a form for this documentation because its nicer to read, and mentioning myself a thousand times just sounded egotistical.

The why is also simple: I did not want to ask myself for the rest of my life how much fun it could have been or if the infrastructure I imagined in my head would have worked as expected. I saw the chance to really work on an Internet scale, command hundred thousands of devices with a click of my mouse, portscan and map the whole Internet in a way nobody had done before, basically have fun with computers and the Internet in a way very few people ever will. I decided it would be worth my time.

posted by Sebmojo at 1:21 PM on March 18, 2013 [1 favorite]


Holy hell, I hope he covers his tracks. The hacker who did a not-crazily-dissimilar thing just went down for three years.
posted by Sebmojo at 1:29 PM on March 18, 2013


IAmBroom, your WiFi's security is a whole different angle of attack than what this guy did. If you're concerned about the type of attack detailed here, what you need to worry about is how the router responds when someone who isn't on your local network, wired or wireless, nonetheless tries to connect to your printer or whatever. Telling us about the security that prevents someone from connecting to the WiFi doesn't give us any information about how safe you are from this kind of attack.
posted by LogicalDash at 2:28 PM on March 18, 2013


Well, if the printer is only connected via Wifi and is on a private network (pretty hard not to do this with consumer hardware) then we can pragmatically answer the question. And nobody please talk about "totally secure if done properly" and then "defense in depth." You cannot totally secure a network if people use it and it is a network. Best never to give anyone the impression that they can totally secure anything, hence you perform risk analysis and build defense in depth. Thinking you have absolute control over the security of a network is dangerous and false.
posted by lordaych at 6:05 PM on March 18, 2013


This is an amazing accomplishment... you've gotta give him props. I'm sure his data is very valuable to certain people....
posted by ph00dz at 6:05 PM on March 18, 2013


To attempt to answer the WiFi printer question pragmatically, setting up a WiFi printer on your home router is not the same thing as connecting it directly to the internet, even if it supports some forms of "internet printing." Connecting it directly to the internet means that it has a public IP address and can be reached without any solicitation on the printer's part: "oh hi there's a printer on the internet, I shall use it" as opposed to a private address, which on typical home networks will begin with 192.168.

There are all kinds of nice holes that printers and routers will create to make things more convenient, as in the case of my WiFi printer that "phones home" periodically to see if I've sent any email attachments to the printer, which really means "I sent them to HP's cloud service via email, which the printer connects to and pulls documents from." But the printer is going out and finding stuff to do and is basically "soliciting" an internet service for data to print; the stuff isn't actually being "pushed" to the printer. And sure, there are a bazillion points along the way where this whole setup can still be compromised without ever actually needing to "find" my printer in the way you might using a port scanner and that's a whole 'nother ball of wax.

With consumer WiFi gear it's pretty hard to accidentally expose your WiFi gear to the public internet if you have decent WPA security, your router doesn't allow remote administrative access, etc. WPS should always be disabled; it sucks and is horrible. Router firmware should be updated when possible but if your printer is behind a router and you haven't taken steps to make your printer accessible through the internet by setting up NAT or PAT, you're not the "hapless user" being described here.

But the nature of the beast is that determined attackers will find a way in if the reward is high or the effort is low enough, so it's a matter of balancing risk with cost and effort and time and blah blah blah and not assuming you're small potatoes and nobody will "find" you.
posted by lordaych at 6:18 PM on March 18, 2013


So - did anybody notice the "You are here" arrow in the map?
posted by symbioid at 7:43 PM on March 18, 2013 [2 favorites]


Could someone give a summary of what this article is saying in layperson's language? And explain why it's a problem if a printer is connected to the internet? It seems like this article might be interesting, maybe?
posted by medusa at 7:51 PM on March 18, 2013


In a nutshell - this person managed to find several thousand devices connected to the internet. He then used those devices to find even more devices, and then was able to turn all of the them into a gigantic scanner, capable of scanning every single (public) IPv4 address in a matter of hours. The fruits of that effort are the various and sundry graphs on that page.

The mechanics of it, however, are what's pretty staggering: locating a large number of Things (printers, embedded devices, routers and so on) into which he (or she) was able to install his code, managing the flow of that much data, controlling all of it, and so on. And mind you, this was for - allegedly - benign research, shits, and giggles.

He's sitting on - and offering to share via torrent - a fair amount of interesting data, and unquestionably broke quite a few laws in carrying it out. Scanning is sort of a shady thing anyway, but installing code onto things you don't own is firmly on the side of Do Not Do This, even if, as he claims, the software deletes itself after a certain amount of time.
posted by jquinby at 8:13 PM on March 18, 2013 [1 favorite]


...the printer bit was, frankly, my surprise that so many are visible and out there on the public Internet (I understand how it comes to pass; it just never ceases to surprise me). The problem with them being out there has been pointed out above, but in brief, printers often come with little web servers on them to make it possible to manage them from afar. These were built in for user convenience and are generally not very well secured. Folks have long ago learned how to compromise this sort of thing and make the printer do things it ought not do, like serve as a stepping stone to other parts of your network.

The real problems are that many people don't know about these extra little servers, and that there generally isn't an easy way to amend the security problems. So they persist out there for years. I hesitate to point this out, but a large amount of medical equipment (like imaging devices, drug control cabinets and more) is equally exposed and vulnerable.
posted by jquinby at 8:19 PM on March 18, 2013


but installing code onto things you don't own is firmly on the side of Do Not Do This, even if, as he claims, the software deletes itself after a certain amount of time.

Especially with such precedents as the Morris worm, which managed to accidentally shut down a significant proportion of the (at the time, very small) internet.
posted by BungaDunga at 8:36 PM on March 18, 2013


lordaych, thank you. That's as I suspected, but I was never quite sure.
posted by IAmBroom at 8:54 PM on March 18, 2013


Speaking of thanking lordaych, thanks also for the reminder about devices phoning home, I keep meaning to firewall in embedded devices like printers and such. Finally got around just now to defining a subnet for them, giving them static leases within it and blocking WAN access from that subnet. If any of them need a firmware update I'll drop the rule, do that and put it back. Unlikely anyway since most consumer devices are abandonware about a week after you take them out of the box.

Treasonous printer firmware is pretty minor in overall hierarchy of risks, but I don't like them things talking behind my back anyway. I just know they're mocking me.
posted by George_Spiggott at 9:59 PM on March 18, 2013 [1 favorite]


The maps and animations are really beautiful. The world map of the 460 million IP addresses is gorgeous, and the Hilbert map is an excellent way to picture the internet (thanks Randall).

MIT uses its 18.0.0.0/8 block in widely separated, regularly spaced chunks, so it comes out as a sparse grid filling the whole block, unlike any other /8 block. Comcast's 73.0.0.0/8 block pulses very strongly with day and night, and all in synch.

The Hilbert map brings home once again just how much is happening in Asia and Latin America. APNIC, Asia, has filled up almost all its blocks and is threatening to burst its bounds, and Asia and Latin America (LACNIC) are riots of color and energy. North America, ARIN, has lots of space, just like it does with land, but large stretches look rather sparse and empty, also like the land.

Lots to see here.
posted by kadonoishi at 1:17 AM on March 19, 2013


Makes me want to go through the settings on my router again and make sure everything is set the way it should be.
posted by vibratory manner of working at 3:29 AM on March 19, 2013


He's sitting on - and offering to share via torrent - a fair amount of interesting data, and unquestionably broke quite a few laws in carrying it out. Scanning is sort of a shady thing anyway, but installing code onto things you don't own is firmly on the side of Do Not Do This, even if, as he claims, the software deletes itself after a certain amount of time.

The thing is that his method is so banal - no tough exploits or anything, just default username and password combinations - that I can't really blame him. If you want people to stop abusing wide open routers, the right response is to force hardware companies into not creating the wide open routers in the first place.
posted by ymgve at 3:47 AM on March 19, 2013 [1 favorite]


The thing is that his method is so banal - no tough exploits or anything, just default username and password combinations - that I can't really blame him. If you want people to stop abusing wide open routers, the right response is to force hardware companies into not creating the wide open routers in the first place.

He will quite possibly go to prison for years if he is caught.
posted by Sebmojo at 4:49 PM on March 19, 2013


jquinby, thanks for explaining this. I appreciate it. Are you (or someone else) willing to explain a bit more? I'm wondering about this:

He then used those devices to find even more devices, and then was able to turn all of the them into a gigantic scanner, capable of scanning every single (public) IPv4 address in a matter of hours.

Can you say a bit more about scanning and what you get from that? Is that the process of trying to connect to all the different devices?
posted by medusa at 8:07 PM on March 19, 2013


So to really break down the details of what's going on here:

The dominant scheme for addressing computers on a network is IPv4, or Internet Protocol version 4. This allows for 4,294,967,296 different possible addresses, from 0.0.0.0 to 255.255.255.255 and everything in between. Some of the addresses are reserved for different purposes and you don't need to check those ones, but there's still a lot.

You can pick an IP address at random and send different kinds of data to that address. The network will attempt to route the data to that address and deliver it. If there's a computer at that address, it may or may not respond to your probe, in different ways, depending on how it's configured and what you sent to it.

'scanning' here refers to the process of working through all four billion or so IP addresses and probing them in different ways, to see if you get a response, and if so, what kind. For this project, six different types of data were collected.

The first kind used here is ICMP ping, which is a protocol that just says "if you're there, send a response so I know you're there", more or less. It'll also tell you how long the roundtrip from your computer, to the other, and back again takes. If you get a response to this, it means that there's a computer of some kind at that address, but if you don't get a response it doesn't mean the opposite. There might be a computer that just doesn't respond to pings for whatever reason. This is what you see in the second link, if you don't change the settings once you get there.

The second is reverse DNS, which isn't actually a probe of the IP addresses themselves. DNS is the system that translates from domain names to IP addresses, e.g. www.metafilter.com to 50.22.177.14. Reverse DNS is doing the opposite, taking an IP address and asking the DNS system if there is a domain name that corresponds to that IP address. Not all of them are as pretty as metafilter.com. For example, a computer somewhere in between me and the metafilter servers has an IP address 68.86.86.190 and a domain name of 'pos-1-3-0-0-pe01.111eighthave.ny.ibone.comcast.net'

The third group is a couple different probes using nmap, a general utility for this sort of thing.

Quick extra background: In addition to sending data to an IP address, you can specify a specific port at that address. Different types of programs on a computer listen for incoming data on different ports, it's how a computer running a bunch of different possible services keeps track of what data is intended for what thing.

So with nmap, it looks like first they did a couple probes to see if there was a computer at that IP address. An ICMP ping, which I already discussed, a probe to port 80, which is where http services listen, a probe to port 443, which is where https services listen, and another kind of ICMP probe, similar to ping. If they got a response to any of those, then they probed a large number of different ports at that IP address, to see what ports the computer was listening on and would respond to. For some of these they were able to tell what kind of computer it was by looking at how the response it gave were put together. I'm not sure what 'IP ID sequence' is, but they got some of that too.

The probes to different ports in that step were generic. In the fourth group they list on the website, they sent out probes tailored to each port's assigned use. Different ports correspond to different services/protocols, and this sends data appropriate to the service assumed to be listening at that port. The data they collected distinguished between five different kinds of responses to this:

1: Open = Probed port was open and returned data in response to a service probe
2: Open/Reset = Probed port was open, but the connection was reset without sending data
3: Open/Timeout = Probed port was open, but the connection timed out without sending data
4: Closed/Reset = Probed port was closed or connection was reset by a firewall
5: Timeout = Probed port gave no response at all

The fifth kind of data they collected is with traceroute, a program designed to show you how data is being routed in between the computer you're on and the computer you're sending data to. My computer sends the data to computer A which looks at the address on the envelope and sends it to computer B, and on down the line until it ends up getting to the address it's going to. Traceroute tells you the IP addresses and domain names for each computer in the chain. They didn't do a whole lot of this one.
posted by vibratory manner of working at 4:53 AM on March 20, 2013


vibratory manner of working pretty much said it all. About all I can do is try to analogize a bit further.

It's as if he wanted to knock on every single door in the US to see who or what would answer the door. You need a really big crowd for that.

So he found 1000 people, and each of them found 1000 more people to help knock on doors. Some of the newest recruits were dogs, who could at least run to a door and bark. Others were kids who could do more than a dog, but only by a little bit.

Once he had a sufficiently large number of door-knockers, he turned them all loose and was thus able to knock on every single door in the US - no matter where it was - in just a manner of hours.

The crowd of knocker is now so large that it's tough to manage the flow of information - what doors to knock on next, as well as the flow of reports back from the field. So he turned quite a few of the knockers into go-fers, who could shuffle reports and instructions to and fro.

(some people liken port scanning to trying the doorknob instead of just knocking which is why some folks consider it somewhat shady; real-world analogies get sort of brittle online)

He couldn't knock on the interior doors - just the ones on the outside. Occasionally, you knock on a door with a big window that let's you see everything going on inside because someone's forgotten to shut the blinds.

In any case, to strain it just a bit more: he did it all without anyone - even the door knockers - necessarily knowing it was happening. And when it was over, they all forgot about it and he had his giant map of every door in the US - where it was, who/what answered, and so on - just like the Census he used in the title of the paper.
posted by jquinby at 5:19 AM on March 20, 2013 [1 favorite]


This is a great post. Certainly a lot of ethical considerations here, but damn if that isn't some fine, fine work that literally wouldn't otherwise exist.

That's a fantastic dataset, and a great write-up of how it was achieved.
posted by odinsdream at 10:35 AM on March 20, 2013


Some of the newest recruits were dogs, who could at least run to a door and bark.

This is the best part of your analogy.
posted by vibratory manner of working at 11:45 AM on March 20, 2013 [1 favorite]


Thanks vmow and jquinby, that is an incredibly helpful explanation.

And this part reminds me of a ground game for a political election:

So he found 1000 people, and each of them found 1000 more people to help knock on doors. Some of the newest recruits were dogs, who could at least run to a door and bark. Others were kids who could do more than a dog, but only by a little bit.

Once he had a sufficiently large number of door-knockers, he turned them all loose and was thus able to knock on every single door in the US - no matter where it was - in just a manner of hours.


SO THAT'S HOW OBAMA WON HE CONTROLS ALL THE DOGS
posted by medusa at 12:39 PM on March 20, 2013


vibratory manner of working: "I'm not sure what 'IP ID sequence' is, but they got some of that too."

In this context, I believe they're referring to the initial TCP sequence number and how said number is incremented as the connection progresses. Most hosts randomize SYNs now to prevent IP spoofing attacks and other types of attack. Most aren't truly random, though, so identifying the algorithm in use can help identify what OS the host is running.

To explain further, when you attempt to connect to a server, your computer sends a TCP SYN packet, to which the server (hopefully) responds with an ACK, which has a sequence number assigned. Your computer then responds with another ACK packet which includes the sequence number received from the server, letting the server know that the packet was received and need not be retransmitted. Further packets have different sequence numbers so they can be individually identified and retransmitted if necessary.

If that sequence of sequence numbers can be guessed reliably, an attacker can relatively pretend to be any host on the Internet even though s/he never sees the reply packets (because they're routed to the real thing). So if, for example, a mail server is configured to forward mail without further authentication for certain IP addresses, an attacker could use that server to send spam by simply pretending to have one of those addresses.

With good enough randomization, the attacker can't guess the sequence numbers so the server ignores their wrong-sequence-number-having packets. It doesn't have to be perfect, just difficult to guess without knowledge of the internal state of the host.
posted by wierdo at 4:10 PM on March 21, 2013


« Older John Cline writes book reviews for The Los Angeles...  |  "It was John Polidori's misfor... Newer »


This thread has been archived and is closed to new comments