US Mappers Screw a Family in South Africa
January 9, 2019 1:52 PM   Subscribe

The visitors started coming in 2013. The first one who came and refused to leave until he was let inside was a private investigator named Roderick. He was looking for an abducted girl, and he was convinced she was in the house. John S. and his mother Ann live in the house, which is in Pretoria, the administrative capital of South Africa. ... The outline of this story might sound familiar to you if you’ve heard about this home in Atlanta, or read about this farm in Kansas, and it is, in fact, similar: John and Ann, too, are victims of bad digital mapping. There is a crucial difference though: This time it happened on a global scale, and the U.S. government played a key role.

Despite appearances, John and Ann are not criminals. ... They just happen to live in a very unfortunate location, a location cursed by dimwitted decisions made by people who lived half a world away, people who made designations on maps and in databases without thinking about the real-world places and people they represented.

A long read from Gizmodo by Kashmir Hill on how very approximate geolocation information is, how various individuals and families have been victimised by commercial companies mapping IP addresses in an inaccurate way (because IP addresses are vague generally), and how the National Geospatial Intelligence Agency (part of the US government), wrongly used the coordinates of this South African home in its database, which is used by the US government and also others.

As the author notes, "the free tools available to us online can give us an unwarranted sense that we know more, and to a higher degree of certainty, than we actually do." Moreover, it is not just the US government that makes such mistakes.

When I emailed the company’s founder Thomas Mather, back in 2016, asking why it had associated so many IP addresses with the Kansas farm, he’d been incredibly candid with me, explaining that the company had picked a default digital location for the United States basically at random without realizing it would cause problems for the person who lived there. He asked me what the company should do to rectify the situation. “Do you have a sense of how far away we should locate these lat/lons from a residential address?” he emailed me back. “Do we also need to locate the lat/lon away from business/commercial addresses?”

I was a little stunned at the time to have the CEO of a company ask me for that kind of very basic advice about his own business. The company wound up changing the default location for the U.S. from Joyce Taylor’s farm to a lake nearby. Taylor and the residents of the farm later sued MaxMind; the case settled out of court.
posted by Bella Donna (49 comments total) 19 users marked this as a favorite
 
“It’s almost with religious zeal that these people come, thinking their goodies are in my yard,” John told me. “The Apple customers seem to be the worst.”

I chuckled at that.
posted by pipeski at 1:57 PM on January 9 [9 favorites]


The problem is that MaxMind provides a service (IP geolocation) that they know is more often than not defective, but:
MaxMind seemed to realize it had messed up badly. At the beginning of 2018, it said it was removing longitudes and latitudes from the free database it offered online, but according to a blog post last April, customers complained, so MaxMind decided to leave them.
Yeah - they provide bad info because their customers demand it. They need to be sued into the ground, and the smoking crater left as an example.
posted by NoxAeternum at 1:58 PM on January 9 [16 favorites]


It is puzzling to me that the complaints of "customers" about a free database has more weight than the actual damage done to the people who are victimised by bad mapping but hey, guess I don't have the entrepreneurial mindset. That Apple line made me laugh as well, pipeski. Ruefully as I read on my Mac.
posted by Bella Donna at 2:05 PM on January 9 [5 favorites]


Honestly, the lede is misleading. The only thing the US did here was have a cartographer have a lazy day setting the location default for Pretoria to a residence instead of a landmark (and when appraised of the matter, they fixed the location readily.) The core problem is that MaxMind is allowed to put out faulty data that causes real harm with no repercussions for doing so. That's the problem that needs to be fixed.
posted by NoxAeternum at 2:13 PM on January 9 [10 favorites]


NoxAeternum: They need to be sued into the ground, and the smoking crater left as an example.

And whoever takes their place on the market can set the smoking crater as the default digital center of the U.S.!
posted by tzikeh at 2:32 PM on January 9 [9 favorites]


The old nostrum "Garbage In, Garbage Out" really pulled its running shoes on with the advent of the internet and big data ...
posted by cstross at 2:35 PM on January 9 [8 favorites]


Ummm...'allow null' is a thing.
posted by j_curiouser at 2:36 PM on January 9 [11 favorites]


I was a little stunned at the time to have the CEO of a company ask me for that kind of very basic advice about his own business.

So the article is much bigger than issues at a single company or the merits/limits of a single CEO, but this probably deserves a longer detour.

These days I'm more surprised when a C-level occupant actually turns out to have a detailed understanding of domain knowledge than when they don't. I'm sure it's helpful to have, but as far as I can tell detail work is not what CEOs are selected for. And I'm inclined to see MaxMind/Mather's candor and query of the journalist as a point of merit rather than one of shame. There's a hell of a lot of CEOs who'd spin and wouldn't ask any questions that would risk the impression they need to learn anything.

The problematic thing is that can (and often does) mean that there is attention to detail that falls through the cracks. A low-to-mid-resolution understanding in management means management doesn't know exactly where to aim people who are paying attention to details. They can mitigate this by empowering detail-attending folks with time and resources they can aim themselves, but that's only if time and resources aren't scarce, and there is some level of resolution at which they always are -- even if you're a person who has put effort into detailed domain mastery and makes it your job to pay attention to, there's always some limit to your attention, and therefore always some corner cases that will always fall through.

And that's what we're seeing across several organizations in this article.

The only plausible solution is to have some process in place for detection and feedback and then adaptation. But a lot of organizations are terrible at that too because of the incentive structures they run on.
posted by wildblueyonder at 3:00 PM on January 9 [2 favorites]


And I'm inclined to see MaxMind/Mather's candor and query of the journalist as a point of merit rather than one of shame.

I'm sorry, but no. This is not "detailed domain knowledge", but a CEO going, when it's pointed out that the false data his company is pumping out is causing real harm, asking the reporter "so, how should we fix it?"

If we were talking about toxic waste being dumped, I don't think you'd be treating this as a lack of knowledge. And the reality is that this false location info is a sort of digital toxic waste, and like the meatspace stuff, it's harmful. But the problem is that we don't treat it as such.
posted by NoxAeternum at 3:17 PM on January 9 [7 favorites]


MaxMind doesn't seem to be to blame. They're providing an area that the IP address could be in; other apps using their data are discarding the radius and using only the center.

That's a huge mistake, changing it from "your car is in Boston" to "the mayor of Boston has your car" isn't what MaxMind intended, and they can't quite control what people do with the data.
posted by explosion at 3:17 PM on January 9 [3 favorites]


They're very much at fault, explosion. To provide coordinates without a forced radius of uncertainty is very irresponsible of them. And I can't believe that it's the same damn company as pulled the Kansas farm thing.
posted by scruss at 3:24 PM on January 9 [1 favorite]


MaxMind doesn't seem to be to blame. They're providing an area that the IP address could be in; other apps using their data are discarding the radius and using only the center.

That's a huge mistake, changing it from "your car is in Boston" to "the mayor of Boston has your car" isn't what MaxMind intended, and they can't quite control what people do with the data.


I was wondering when this argument would come up, because it's a bad penny in these discussions. If you're providing a result with a massive margin of error, then you are providing bullshit, and should not be doing so. MaxMind is very much at fault here, because they're the ones putting out this false info, and they damn well know that their end users aren't looking at the MoE.
posted by NoxAeternum at 3:24 PM on January 9 [3 favorites]


They are providing the radius of uncertainty. Other companies are deliberately misusing their API.

I could see an argument that MaxMind should cut off companies that misuse it, however. Ideally their TOS would reflect this.
posted by thefoxgod at 3:28 PM on January 9 [2 favorites]


Yeah, this is 100% their fault. If they can't be accurate within a very small margin, they should not return results or return randomly distributed ones. This is not rocket science. I know why they don't want to take this route, but that doesn't absolve them.
posted by tocts at 3:30 PM on January 9 [3 favorites]


I'm not sure randomly distributed ones would be better. Then police would be showing up at random houses with warrants instead of the same house all the time. It would spread the pain around but it wouldn't solve the problem and it would make the problem even less noticeable since there wouldn't be a clear pattern of a single place being targeted.

IIRC, a lot of cities use city hall as the place where their elevation is measured and where distance signs on highways are measured to -- using a location like that for the default mapping data would at least mean the harassment was being directed at a government entity that had some resources to deal with it, rather than some random private citizen.
posted by jacquilynne at 3:46 PM on January 9 [9 favorites]


They are providing the radius of uncertainty. Other companies are deliberately misusing their API.

Sorry, but no. When the MoE is so high as to make the response worthless, the ethical thing to do is not provide the information, because it's worthless.
posted by NoxAeternum at 4:01 PM on January 9 [5 favorites]


MaxMind's default location should be their own corporate headquarters, or at least the facility where they do QA. Justified vengeful joking (or not) aside, it's the cheapest way to close the loop on their debugging efforts. If someone didn't already suggest it internally, they need better people. If someone did and the idea was rejected because the consequences were obvious to the people in the room, they definitely need to be overrun by the parties they've misled.
posted by qbject at 4:08 PM on January 9 [9 favorites]


Worthlessness is relative. Knowing that an IP address is within a given city or country has practical uses. Taking the centre of a large circle and misrepresenting it it as a precise location for profit is where the actual deceit occurs, surely?

The fault lies both with the company misusing the data, and with the company that is blithely selling data knowing it will be misused.
posted by pipeski at 4:17 PM on January 9 [5 favorites]


Sorry, but no. When the MoE is so high as to make the response worthless, the ethical thing to do is not provide the information, because it's worthless.

Worth is not a binary condition. There are a lot of purposes for which a response of 'somewhere in a radius of 10 km around this exact point' is useful. If you need to target an emergency alert to people in a specific area or want to target advertising to people in a specific city, a point and a radius is entirely useful information. If you give people the point and the radius and they ignore radius, that's not necessarily your fault.

MaxMind and other providers could quite possibly be doing more to shut off the pipes to users who are mis-using the API data, but it's simply wrong to say that they way they are providing data is inherently incorrect or valueluess.
posted by jacquilynne at 4:18 PM on January 9 [2 favorites]


If they can't be accurate within a very small margin, they should not return results or return randomly distributed ones.

MaxMind's customers are not individuals, they're companies with whole engineering teams. They aren't just misusing this information because they don't know better. MaxMind provides the data for a circle. IPinfo, to pick an example, takes that data, and renders an API that includes a point. There is no disclaimer about the accuracy of that point, as there currently is on MaxMind's page. There is no accuracy radius. There is no note that this information is not rendered at an accuracy suitable to identify a household, again as MaxMind currently does, even on a page that's really marketed towards developers. IPinfo just provides a point, and some nice marketing copy about how they provide fast, accurate location data.

IPlocation.net then takes that data, and renders it out to consumers on a nice page that also, when I turned adblock off, tried to sell me Cox services and a new Nissan and a VPN service called HideMyAss with a cartoon donkey. There's some information about accuracy at the bottom. After the ads. They, I would argue, do have some responsibility to realize that consumers are not going to scroll down to read the disclaimers and the terms and everything else.

But MaxMind doesn't sell a product to ordinary people. They sell a product to engineering teams. Engineering teams who would know full well what this meant, and then made deliberate decisions to use that information in ways other than intended. Those people, the first people to receive data with a circle and pass on data with a single point? Those are the ones who I think need to get sued. IPinfo and other such resellers of this information had reason to know that the information they were providing would be used in misleading ways, and did it anyway, because it makes them look better.
posted by Sequence at 4:22 PM on January 9 [17 favorites]


To provide coordinates without a forced radius of uncertainty is very irresponsible of them.

I agree, but this is super common.

I mean, it's common enough that I have a basically off-the-shelf rant about it, and how fucking dangerous it is, but it still happens all the time. And frankly this example is actually one of the less serious ones that I can think of.

Here's a just-barely-hypothetical example: you map a route through a minefield. You use your GPS-enabled device to helpfully record your track, so that other people who are similarly allergic to small pieces of high-velocity shrapnel can traverse the minefield afterwards. But because your GPS device, or the device receiving information from the GPS, was engineered by either lazy or stupid people (hard to say which), it doesn't bother recording the uncertainty. It reduces your path from a sort of "probability cloud" of where you might have been, to a series of infinitely-small vertices and infinitely-thin lines, creating a dangerously false sense of precision. And when you make a sharp turn, or worse yet a U-bend to get around something, it becomes impossible to tell for sure where you went.

Or, equally just-barely-hypothetical: you do the same thing, but in an area where people like to set up explosives under bridges and in culverts. Because you are not stupid, you don't go over the bridges or culverts, you go around them. But your track, recorded at (say) 10 second intervals or whatever, doesn't show this; it might show one dot on one side of the river, and another dot on the other side of the river, and connect the two with a straight line, which basically every map-based UI will display as such. There's no clue to suggest that you didn't go over the bridge. And the poor sod who follows the track, thinking "well hey, this guy made it okay, so if I follow the same route I should be, too"... gets to go home less a few pieces.

It's hard to decide who to blame with this, because too many GIS tools are based around the concept of "points" with infinitely-small radii as their most basic geospatial feature, with higher-order features (lines, areas, etc.) constructed from them. But "points" do not exist in the real world. The only time they exist are when we project abstract stuff—like political boundaries—onto the real world. In the real world, all points have radii, all lines have thickness, all areas have boundaries with transitional zones. (And actually 'radius' is not really the proper way of thinking about it; really you want a sort of probability density function about a particular point, or along a line, etc. But a radius would be fine for the purposes of, say, a navigation UI, if drawn at the 75% probability line or something. Use case dependent.)

But even if you start out trying to do things the right way, if you start using off-the-shelf GIS tools, it won't be long before your data gets stripped of the "radius" term pretty quickly. Because, charitably, they are designed for doing abstract stuff like political boundaries, and not really for modeling the real world. But that's why there's so much bad data around: it's hard to do things the right way, even if you want to. All it takes is one developer somewhere using a Point class when really they should be using a more complex implementation and boom (no, literally, that's the land mine again).

There needs to be a widespread understanding of this, or it's going to get worse rather than better.
posted by Kadin2048 at 4:38 PM on January 9 [28 favorites]


I'm sorry, but no. This is not "detailed domain knowledge", but a CEO going, when it's pointed out that the false data his company is pumping out is causing real harm, asking the reporter "so, how should we fix it?"

Did you already know about the situations described in Hill's articles AND understand all the moving pieces involved in how they came to be before reading work like hers?

If not, and that's where you got the information that's the basis of your comment, why shouldn't the CEO also be interested in Hill's research and the opinions formed in the process? I mean, hell, if simply reading her work invests you with the power to pronounce apt judgments on the CEO's limitations, surely the person who wrote it could give him a dose of valuable perspective and asking her questions would be a good move.

If you already knew everything in the article... maybe you should consider you're in possession of detailed domain knowledge.
posted by wildblueyonder at 4:40 PM on January 9


Did you already know about the situations described in Hill's articles AND understand all the moving pieces involved in how they came to be before reading work like hers?

If not, and that's where you got the information that's the basis of your comment, why shouldn't the CEO also be interested in Hill's research and the opinions formed in the process?


Because it's acceptable for myself - a layman who is interested in (and alarmed at) how companies who acquire and distribute data do so in the unregulated wilderness of the internet to learn about the problems being caused by the lack of regulation via the work of an investigative journalist.

It is not acceptable for the CEO of a company engaged in that very activity to do so. This is literally basic aspects of their operation. And as the response by the US government illustrates, the issues that MaxMind's careless mapping have caused are not that hard to figure out (as was pointed out, when the government is looking to create reference points for potential diplomatic problem spots like Jerusalem, they are very careful as to where they get placed.)

The reality is that these issues are actually not that hard to figure out - as long as you're paying attention. The problem, ss we keep seeing over and over, is that the tech industry doesn't.
posted by NoxAeternum at 4:57 PM on January 9 [2 favorites]


Two different problems are sort of being conflated here. First, there's the issue of having a default location be some physical spot that's randomly chosen. This is an extremely bad design choice because it confuses "yes I have an answer for that, it's X,Y +/- Z" with "i don't know" so that the end-user can't tell if you REALLY don't know or if the correct response to the query is a Kansas farm. That one's all on MaxMind, and yeah I'd be pissed if I owned that Kansas farm too.

The second problem is that in general IP geolocation is not an exact science. In fact there's no guaranteed relationship between an IP and its user's location at all, due to things like VPNs and the fact that some computers can be moved, never mind more gradual changes like reassignment of IPs at a service provider level. I'm much more inclined to give MaxMind a pass on this, because there are a lot of legitimate uses for this kind of information even if it's inaccurate, and really the people using a geolocation API should be mindful of that (I don't only mean end-users, though that would be nice, but lots of people are ignorant of the nuances of IP mapping). I don't understand the people calling for MaxMind to be shuttered because someone took their admittedly inaccurate data and presented it as gospel. That's on the middle man, there, in my mind.
posted by axiom at 5:11 PM on January 9 [2 favorites]


What they actually know is that the IP address originated within South Africa, rather than that it originated within a certain radius of this house in Pretoria; is there some compelling reason thar the coördinate/radius pair couldn't be replaced with a simple text string saying "this IP address originated in the Republic of South Africa"?
posted by bracems at 5:41 PM on January 9 [1 favorite]


any radius centered on Pretoria large enough to contain all of South Africa would contain most of four other countries so if they actually know the origin is in South Africa that would be a massively inaccurate way to say so.
posted by bracems at 5:45 PM on January 9


" or at least the facility where they do QA."

Hay, whoaa... The QA undoubtedly logged this bug, it's not their fault.*

* I have done QA.
posted by el io at 5:51 PM on January 9 [3 favorites]


The fundamental problem here is that Maxmind is being asked by its customers to answer a question you cannot answer: what is the latitude and longitude of a city or a country? A city or a country isn't a single geographic point but a polygon on a map. If your clients have systems that only work with latitudes and longitudes, polygons won't work. So either your database says "sorry, we have no idea," or you do what Maxmind did: send an approximation and say just how much of an approximation it is.

It is perhaps irresponsible of Maxmind to take the latter approach, knowing what we know now about how people tend to use this information. But I'm guessing that when these databases were first constructed, no one knew that IP geolocation would be abused in the ways faced by the victims in the article. You think you're providing better service by not immediately rejecting a query you cannot answer, and giving your best guess instead (here's a location that is inside the region you're asking for). It turns out that has significant consequences that stem almost entirely from clients misrepresenting the data being served, which then confuses their customers into thinking the data is way more accurate than expected, and then making very poor decisions based on that data.
posted by chrominance at 6:17 PM on January 9


What they actually know is that the IP address originated within South Africa, rather than that it originated within a certain radius of this house in Pretoria; is there some compelling reason thar the coördinate/radius pair couldn't be replaced with a simple text string saying "this IP address originated in the Republic of South Africa"?

This is in no way to excuse MaxMind of their blithe approach, but in 99.9% of the cases, it's a computer trying to target an ad or provide a service that is buying the data, and if you give a cheerful sentence instead of a lat/long pair, the customer's computer will crash. To go a step further, suppose you know that an IP address originated somewhere in the region between Taipei and Taoyuan, or between Sevastopol and Simferopol? What does your simple text string say now?
posted by Homeboy Trouble at 6:45 PM on January 9


I'm a cartographer (which...the writer seems to think NGA's political geographers and cartographers are the same thing? that's annoying to me but ultimately neither here nor there) and lat/long accuracy is a real pickle. I would have guessed, based on the lede, that the problem was precision in the lat/long fields - two decimal places will identify a village. Three will identify a campus. Four, a parcel of land. Five, a tree. All of those numbers are meaningless unless they're tied exactly what you intend them to be tied to - even if you are using the appropriate precision, that's still gonna be a dot on the map. That's still a specific place. No matter what level precision lat/long NGA designates for a city. That's why accuracy matters!!! I'm both surprised and totally unsurprised that NGA doesn't have those coordinates set to something more than a generic rough center, but someone would have to manually go through every populated place. Probably would have been a good idea to do it for capitals at the very least. Though who knows how long these datasets have been around and in use?
posted by everybody had matching towels at 6:46 PM on January 9 [9 favorites]


Here's the thing. Maxmind sells a database. One which is more typically used for purposes like routing traffic to nearby datacenters or determining what kind of local advertising to display. Things were being wrong 1 percent of the time is acceptable, not locating stolen goods. (Although, one wonders what the standard for probable cause is).

And in 2016 when this story originally broke, I don't think the Maxmind CEO realized there was another set of customers using their data by basically putting a bunch of ads and fancy graphics around their database sales demo. I don't know anyone did. At this point it seems like the journalism seems like it could write itself; buy a copy of the maxmind data, find some abnormally clustered IPs, adjust for population density, and those small farms in Kansas should stand out like beacons.
posted by pwnguin at 6:47 PM on January 9


My final three years in the Navy were spent at NGA. I did not work in the parts of the agency described in the article-- my job was writing worldwide navigation warnings and editing Navy navigation publications, but I have three years of familiarity with agency, and this article seems accurate in everything it says. At the time was NGA 95% civil service and 5% active military, and I assume those numbers are roughly the same, six years later. A lot of smart people worked there, and I am not surprised they fixed it once the problem was brought to their attention. I will guess (from my own experience with FOIA requests on the agency's world-wide database of shipwrecks) that the month delay was to get permission to make the changes and have legal and other departments look at it. The actual fix, from notification to verification to talking to a boss to formalizing it-- wouldn't have taken more than a morning's worth of work. I'd be happy to answer any other questions about general NGA topics if you have them. To me, the scariest and truthiest-sounding part of the article was this: "Olivier said there’s likely no one in the world who really understands how all these databases interact."
posted by seasparrow at 7:15 PM on January 9 [6 favorites]


MaxMind's default location should be their own corporate headquarters

Yup. This is 100% the answer for many technology companies and products today, and it makes debugging easier when data comes up with the company location (which can't be right, red flag) vs some other location that may well seem correct at first glance.
posted by davejay at 7:20 PM on January 9 [1 favorite]


I keep seeing these instances where lip service is paid to obscuring the location or identity of someone. In this case it's someone who specifically expressed legitimate anonymity concerns to the author of an article that's about this very topic! And yet the author provides two separate methods of easily locating their home. One method takes about 45 seconds and the other takes about 10 seconds.

It pisses me off because I frequently receive requests to not give away precise locations of homes in the video projects I work on. It's not at all difficult to do this if you actually treat it as a priority. In such cases I plan how to get exterior shots that exclude recognizable landmarks in the background. I sometimes just find shots that give a sense of the general vicinity without actually showing the facade of the residence in its entirety. In the edit I make multiple passes with editors and coworkers with fresh eyes, looking for signs, license plates, and other details to blur. People with important stories to tell are often reluctant to go on camera for very good reasons, so when somebody like that opens their home and their life to me, you better believe I'll do everything I can to respect this simple request.

On a side note, contrary to the NGA spokesperson's claim, this house is nowhere near Pretoria's city center nor the Union Buildings, which is what I assume she's referring to when she says "the capital [sic]".
posted by theory at 10:01 PM on January 9 [3 favorites]


Hay, whoaa... The QA undoubtedly logged this bug, it's not their fault.*


A QA engineer walks into a bar.
Orders a beer.
Orders 0 beers.
Orders 99999999999 beers.
Orders a lizard.
Orders -1 beers.
Orders a ueicbksjdhd.

First real customer walks in and asks where the bathroom is.
The bar bursts into flames, killing everyone.
posted by DreamerFi at 2:18 AM on January 10 [8 favorites]


For a while there, Apple Maps confused the regional town of Mildura with the geographic center of the Mildura region, which happens to be 70km away in the middle of a desert national park.

Directions proved misleading.
posted by nickzoic at 3:50 AM on January 10 [2 favorites]


I'm not enough of a mathematician, but wouldn't there be a way of expressing the radius as a probability (or confidence level)?

That is, it could be at that precise point, or it could be at another point inside the circle. If the circle is small, the probability that it is at that point is high — if the circle is large, the probability is lower.

So the coordinates could come with a confidence level of 80% or 0.0008% as a measure of how likely they are an accurate match.
posted by rochrobbb at 5:06 AM on January 10


I'm not enough of a mathematician, but wouldn't there be a way of expressing the radius as a probability (or confidence level)?

Is it possible? Absolutely yes.

Is it something that the company providing this service would ever "waste" time implementing? Absolutely not, because the product manager never got a customer request for the "don't lie about how precise our results are" feature.
posted by tobascodagama at 5:58 AM on January 10


Additionally, axiom and bracems are right here. This isn't a case of "the GPS fix was weak but the mapping service obscured that". It's a case where the IP-to-location translation is returning GPS coordinates that it has no business returning, because it's literally not possible to accurately convert an IP to a GPS location in like 99% of circumstances. Furthermore, the majority of this service's customers would probably be satisfied with "Peoria, South Africa" as a result anyway, as long as it was formatted in a machine-readable way.
posted by tobascodagama at 6:10 AM on January 10 [2 favorites]


I'm not enough of a mathematician, but wouldn't there be a way of expressing the radius as a probability (or confidence level)?

Yes...ish. A rough method would be to add a couple additional fields of info in the data set: a field indicating which of a number of standard probability distributions most closely fits, and one or a small number of fields for parameters associated with the distribution. That would probably be a reasonable solution in most cases - much of applied statistics uses a nice finite list of standard probability distributions, so quite possibly the uncertainty data that any mapping app would use, assuming it was this sort of "cloud" type of uncertainty (relevant eg. for uncertainty in how coordinates are reported from GPS measurements) would already be reported as one of these standard distributions. (+)

But that's not necessarily the type of uncertainty data that would be available.

Instead, the uncertainty may be "this IP address is somewhere in this country or city or sub-region of the telecom network". Those sorts of situations are better modeled not as points and clouds in a Euclidean coordinate system (as someone else pointed out above, South Africa is not circular) but as topological regions. As the video in that link talks about, some GIS programs already use topological data (*), so there are some standard formats already that could be used in databases and apps.

(+ For even more accuracy, you'd need a way to convey the specific function that a probability distribution follows. Functions are determined by infinitely many points, which is of course not possibly to exactly represent on a computer, so this would more commonly be approximated as a truncated list of Fourier coefficients.)

(* In the background, the regions are still represented as polytopes - regions bounded by lines between vertex points, because of the whole computers are finite issue. But it's a different use of points.)
posted by eviemath at 6:16 AM on January 10 [3 favorites]


In other words, there is already a standard way to report region data like "Peoria, South Africa" in most GIS programs, and MaxMind and its clients could absolutely switch to using that standard if they so chose. It would require reformatting their entire database. But topological GIS systems also allow specific points to be specified, so some backwards compatibility could be worked out.
posted by eviemath at 6:21 AM on January 10 [3 favorites]


Furthermore, the majority of this service's customers would probably be satisfied with "Peoria, South Africa" as a result anyway, as long as it was formatted in a machine-readable way.

Which in many cases, they would then turn into a point so they could calculate some kind of distance from it.

Treating locations like points or circles is the mapping equivalent of the spherical cow. It may not be correct, but there are a whole pile of calculations that it makes a lot easier, so people do it anyway.
posted by jacquilynne at 6:56 AM on January 10 [1 favorite]


If the CEO of a company, who's business is selling on the internet, does not understand the difference between an IP Address and a real postal address, then he deserves to be taken to court for everything he owns and lose his business.
Besides, ignorance of the law is no excuse when breaking it.
posted by Burn_IT at 8:05 AM on January 10 [2 favorites]


I was wondering when this argument would come up, because it's a bad penny in these discussions. If you're providing a result with a massive margin of error, then you are providing bullshit, and should not be doing so. MaxMind is very much at fault here, because they're the ones putting out this false info, and they damn well know that their end users aren't looking at the MoE.

You're being too harsh. They're not providing the data with a "massive margin or error," they're trying to indicate a general region by drawing a circle around it.

The problem comes when someone decides to interpret the circle as a point.

How else would you suggest they do that? The only alternative I can think of is that they keep a database of geographic shape files of different regions, and return a reference to one with the result (e.g. the laptop is in Toronto, Canada; here are its borders).
posted by cosmic.osmo at 8:31 AM on January 10 [1 favorite]


I used to work for a group that provided a consumer-facing website for customers working in the agriculture industry, and I used the MaxMind database product a few times. For some things, it's insanely useful: when quickly putting together a map of where people recently accessed your website globally, it'd give you the rougher level of accuracy you want. When trying to guess the location of a user to provide some defaults on a list of available products, it'd also be a good fit. Both of these things are guesses that are good enough at a county level or even a state level, and because of the industry we were in, we knew how inaccurate things could be outside of urban areas.

The problems outlined in the article are a mismatch between the service offered and the use cases that developers used the data for. Developers, or analysts, creating products and APIs have created apps and websites that expect the ability to take an IP address and return a lat/long (and maybe, if there is some foresight, a radius). That is not at all what the data provides, and it shouldn't be used to drop a pin on a map. Maybe a big shaded circle, maybe a highlighted city or country because radius is useless. And that requires not a single number for radius, but maybe a field indicating confidence level: do I return a point location, a center and radius, or a city/state/country? Or do I return nothing at all?

Then you have a mismatch between perceived effort and actual effort. People think geolocation is magic! What do you mean you need to implement a spread of possibilities? Just give me a point on a map! So we end up with a bunch of crap that gives you points on maps that are harmful.

For what it's worth, "a farm in Kansas" would have probably been a completely viable or even *expected* result on the site I worked on. But you don't drive to that place assuming it's someone who looked at your website, because that's creepy and wrong.
posted by mikeh at 8:55 AM on January 10 [4 favorites]


My addendum would be that MaxMind and others sell (or did at the time) a very general product and the intelligence on top of that data is a second tier product. People think they can do mapping themselves, and some can, but for most *this is a lie*. You need more logic above the data, because rules and data are two separate things and you need both.
posted by mikeh at 8:58 AM on January 10 [2 favorites]


I'm not enough of a mathematician, but wouldn't there be a way of expressing the radius as a probability (or confidence level)?

Yes, there are a variety of ways to do this.

The simplest, and most traditional method, is just to not store the position itself with more precision than you actually have. If you don't know your position to 8 decimal places in degrees, well duh, don't store that, or at least don't provide it to the user. But then somebody sticks that value into a database, and it pops out the other end with the full faux-precision of decimal degrees stored as FLOAT (5.6 ft!) or the overachiever's special, the 16 byte DOUBLE (microns!), aaaaaaand now you have a problem. (Separately, you can get nasty rounding errors doing floating-point math on lat/lons. Friends don't let friends do this.) In terms of stuff you can do right now this is probably the best solution, though: don't ever store a "naked" lat/lon or use the datatype to imply precision. Use an array or array-like object to store the value and the precision, and put in realistic values for the precision based on the source of the measurement. Better yet, use GIS extensions if your database has them.

As for more general solutions, I personally like the shamelessly-stolen-from-physics spherical probability density function method, but the implementations (and I've started to whiteboard it out a few times) get admittedly ugly fast. But it is quite flexible, and covers cases like asymmetric uncertainty, which is common in real-life applications. (E.g. when you have range error that's different from the azimuth error, leading to a sort of oblong "probability cloud" in X/Y.)

GPS devices generally provide a "Dilution of Precision" (DOP) measurement, which is an attempt to condense uncertainty in various dimensions down into a manageable, scalar value. It's a good idea, originating with the Loran system, although I've found that it's often the first thing to get tossed out by software (in theory they can be computed from the lat/lon, time, and ephemeris, but in practice this is rarely done). Plus, while they are good at representing the uncertainty introduced by satellite triangulation, and should always be preserved if you are dealing with GPS-sourced positions, they're only quantifying one source of positional error. HDOP and VDOP don't tell you, for instance, about error due to multipath. Functionally, they are an upper bound on the precision of the measurement, and a starting point in determining the overall uncertainty.

None of this stuff is exactly cutting edge, by the way. I'm sure cartographers and GIS people have solved this problem within their domains, but the problem is at the edges, and that GPS devices in particular have proliferated much faster than the domain knowledge about how to properly use the data. Anyone can fire up Android Studio or Node.js and start doing stupid stuff with latitude/longitude pairs, and they do.
posted by Kadin2048 at 2:19 PM on January 10 [3 favorites]


Yes, cartographers and GIS people have (at least somewhat) solved this problem, and the description is in my post/link above! It's even part of the standard data structures /representations that Google Maps uses or can work with.
posted by eviemath at 2:53 PM on January 10 [2 favorites]


 They are providing the radius of uncertainty. Other companies are deliberately misusing their API.

So they're returning what could be as few as three floating point numbers. These are, of course, misused by half-bright programmers who think they know what they're doing and assume that two of the numbers constitute a point.

MaxMind should, of course, be returning an object that defines the polygon centred on the point and buffered by the uncertainty distance. If the programmer insists in taking the centroid of that polygon, or even worse using the first point on the polygon's perimeter, then the programmer has done something wilfully wrong and is at fault.

Even the old and unloved 'geo' URI could return an uncertainty radius. But hey, move fast and break stuff always trumps taking responsibility when you're making it easy for others to make mistakes.
posted by scruss at 3:23 PM on January 11 [1 favorite]


« Older Bite enemy tails but don't let them bite yours!   |   How it is told is not neccessarily how it is. Newer »


You are not currently logged in. Log in or create a new account to post comments.