A Black Eye For Open Science
May 12, 2016 11:47 PM   Subscribe

Recently, a dataset of 70,000 scraped OkCupid profiles from November 2014 to March 2015 was released on the Open Science Framework. The set, which was acquired without the consent of either OKCupid or the profile owners, had no anonymization performed on it, meaning that the profiles could be easily correlated to the people behind them, effectively doxing these individuals, a gross violation of research ethics on a number of grounds. Social computing expert Oliver Keyes described the release as "without a doubt one of the most grossly unprofessional, unethical and reprehensible data releases I have ever seen."

The response of the Open Science Framework leadership to this has also been concerning:

The Open Science Framework was created, in part, in response to the traditional scientific gatekeeping of academic publishing. Anyone can publish data to it, with the hope that the freely accessible information will spur innovation and keep scientists accountable for their analyses. And as with YouTube or GitHub, it's up to the users to ensure the integrity of the information, and not the framework.

If Kirkegaard is found to have violated the site's terms of use — i.e., if OkCupid files a legal complaint — the data will be removed, says Brian Nosek, the executive director of the Open Science Foundation, which hosts the site.

This seems likely to happen. An OkCupid spokesperson tells me: "This is a clear violation of our terms of service — and the Computer Fraud and Abuse Act — and we’re exploring legal options."

Overall, Nosek says the quality of the data is the responsibility of the Open Science Framework users. He says that personally he'd never post data with potential identifiers.

(For what it's worth, Kirkegaard and his crew aren't the first to scrape OkCupid user data. One user scraped the site to match with more women, but it's a bit more controversial when data is posted on a site meant to help scientists find fodder for their projects.)

Nosek says the Open Science Foundation is having internal discussions of whether it should intervene in these cases. "This is a tricky question, because we are not the moral truth of what is appropriate to share or not," he says. "That's going to require some follow-up." Even transparent science may need some gatekeeping.
posted by NoxAeternum (66 comments total) 20 users marked this as a favorite
 
Yeah, this is terrible. I couldn't believe Kirkegaard's ¯\_(ツ)_/¯ responses.
posted by Johnny Wallflower at 11:51 PM on May 12, 2016 [7 favorites]


As far as I can tell Kirkegaard just removed the user identifiable data yesterday. From the OSF change log.

"2016-05-12 02:41 AM

Emil Ole William Kirkegaard removed file data/parsed_data.rds from OSF Storage in The OKCupid dataset: A very large public dataset of dating site users

"
posted by bswinburn at 12:10 AM on May 13, 2016


Holy shit, that guy's entire Internet presence is the definition of insufferable twit.
posted by kmz at 12:13 AM on May 13, 2016 [11 favorites]


Yeah, I'm definitely not seeing user_data.csv or parsed_data.rds in the files section. (Thank god - I was almost certainly included in this dataset.)
posted by naju at 12:15 AM on May 13, 2016 [1 favorite]


I am just shrieking inside at this. This is such a huge violation of research ethics, even after they remove the identifiers. It is not even Godwinning to say "You know who does research on non-consenting subjects without oversight" because the Nuremberg Code, which is the basis for the Declaration of Helsinki and pretty much all human research ethics, was born of the Nuremberg trials.
posted by gingerest at 12:18 AM on May 13, 2016 [38 favorites]


That said, OKCupid would probably be pretty sympathetic (previously) if not for the fact their data were being used without their permission, but I expect more of academic researchers than I do of corporations. This is probably really dumb of me.
posted by gingerest at 12:24 AM on May 13, 2016 [1 favorite]


Holy shit, that's my friend Oliver!
posted by feckless fecal fear mongering at 12:33 AM on May 13, 2016 [2 favorites]


that guy's entire Internet presence is the definition of insufferable twit.

From his personal site:
"I have been active in the Danish Pirate Party in relation to internet freedoms, civil liberties (real ones, not those pushed by social justice warriors) and harmful intellectual monopolies" (emphasis mine)

Yup.
posted by deadbilly at 12:34 AM on May 13, 2016 [52 favorites]


Bad as this sounded, it just kept getting worse and worse as I read the link. I particularly like this tweet: "I'd rather you just untag me from ethics/IRB stuff. :)" No you idiot; you can't simply "unsubscribe" from considering research ethics in your work.
posted by zachlipton at 12:43 AM on May 13, 2016 [57 favorites]


Oh man, and in addition to the many papers he put out in his vanity rag, the guy has published in Mankind Quarterly. Great.
posted by gingerest at 12:48 AM on May 13, 2016 [6 favorites]


That said, OKCupid would probably be pretty sympathetic (previously) if not for the fact their data were being used without their permission

I think OKCupid would always be unsympathetic to the release of non anonymised data. Not for ethical reasons, but because it's bad for business in a way that their own experimentation isn't.
posted by howfar at 12:50 AM on May 13, 2016 [6 favorites]


From OKCupid's Privacy Policy:
We also may share aggregated, non-personal, or personal information in hashed, non-human readable form, information with third parties, including advisors, writers, researchers, partners, advertisers and investors for the purpose of conducting general business analysis, studies, articles, and essays media and entertainment or other business purposes.


This is probably why OKCupid is actually mad. I know that Twitter sells access to scrapes to academics. I mean, sure, it sucks that some jackass scraped non-anonymized data and put it out there, but it seems likely that, just like Twitter and other "you are the product" type services, OKCupid probably already sells data access to researchers. Having this data out there limits the value of OKCupid's product.

I'm just guessing, of course. The only insider baseball knowledge I have is of Twitter.
posted by xyzzy at 12:55 AM on May 13, 2016 [1 favorite]


I mean, a key part of that is also "aggregated, non-personal, or personal information in hashed, non-human readable form" which IMO is much more tolerable than what is going on in this dataset.
posted by naju at 1:01 AM on May 13, 2016 [12 favorites]


Sure. But hardly anyone ever talks about "you are the product" in terms of research--most of the focus is on advertising. A fair number of people probably have no idea that their online data through these services is being used to conduct academic research in everything from AI to economics to human behavior.
posted by xyzzy at 1:07 AM on May 13, 2016 [2 favorites]


True, but leaping from that to "this is probably why OKCupid is actually mad" is bad faith to a really unsavory degree.
posted by No-sword at 2:11 AM on May 13, 2016 [8 favorites]


What I mean is, this shouldn't become a Both Sides Do It, Truth Lies Somewhere In The Middle story. It's the lack of consent combined with lack of anonymization that's the issue.
posted by No-sword at 2:14 AM on May 13, 2016 [10 favorites]


Have the Reddit/*chan MRAs started using this information to harass those they classify as “SJWs” yet?
posted by acb at 2:17 AM on May 13, 2016 [2 favorites]


When asked about anonymizing the data: "No. Data is already public." Why does this seem so familiar?
posted by Ivan Fyodorovich at 4:07 AM on May 13, 2016 [3 favorites]


Oh, man. I am so sad that I am working from home this morning so I can't see my boss' face when she reads the link I just sent her about this. (University research compliance office. We have..thoughts, and feelings, about this kind of thing.)
posted by Stacey at 4:36 AM on May 13, 2016 [13 favorites]


Additionally, part of the definition of human subject research (in the US, I don't know Danish law) is that it requires intent to create generalizable knowledge. Fucking around to learn interesting things about our userbase isn't usually trying to create generalizable knowledge, so the same ethical principles don't apply.

Which is not to say there aren't different ethical concerns related to the info dump, just that they aren't the specific set of ethical concerns that I have about this OkCupid scenario, in my professional capacity as someone who would be having a very, VERY bad day at the office if this had happened at my university.
posted by Stacey at 5:18 AM on May 13, 2016 [9 favorites]


Doing a Google search turned up a rep on GitHub (not going to link it, but it's a different person to this Danish guy) with an OKCupid scraper script and CSV file of scraped user data, with a recent check in (16 hours ago) labelled "add fully anonymized data". Previous check in before that seems to be 2 years ago, so it seems that a large dump of presumably not fully anonymized OKCupid user data has also been sitting on GitHub for the last 2 years.
posted by L.P. Hatecraft at 5:23 AM on May 13, 2016 [3 favorites]


Metafilter - It's about context.
posted by DigDoug at 6:01 AM on May 13, 2016


The data isn't gone, because data is never gone from the internet. It's now just harder to find. Which means I can't tell if I or my friends are on this list, but others can. =/
posted by andreaazure at 6:13 AM on May 13, 2016 [4 favorites]


So it's basically a big DB for channer idiots to use for harrasment purposes and the sites entire stated purpose is a lie?
posted by Artw at 6:20 AM on May 13, 2016 [1 favorite]


In the interest of getting back to discussing the OP, this document is the Danish Code of Conduct for Research Integrity. I don't know how binding it is, or how seriously the university in question might take this, but at least the university is required to have some policy here:

Institutions are responsible for providing secure data storage facilities that are
consistent with confidentiality requirements and applicable regulations and guidelines,
e.g. on the processing of personal data.


That having been said, I do know that there has a been a big push for open data access in Denmark, and the push has specifically ignored the needs of individual scientific fields. I'm a (theoretical) physicist (working in Denmark), so the research area I'm most familiar with that is affect is large experimental projects (think, the detector collaborations at LHC; several thousand scientists). All such experiments have governing documents about who owns the data, and what they can do with it; some of the push for open data access has been in direct conflict with these requirements.

The problem is that the push comes from on high, and those people have absolutely no idea how research at the ground level is done but somehow think they can set policy. Or even worse, they know how it works in their own field and don't realize other fields are different.

I see a connection here - if the rules and policies that some authority says you should follow don't take into account the realities of your research, then you might start to think you can just set your own rules, or that there are no rules. And then, maybe if you're kind of a jerk to start with, you do something terrible like this dude.
posted by nat at 6:20 AM on May 13, 2016 [2 favorites]


Why hasn't OKC put measures in place to rate limit bots and scrapers?
posted by ursus_comiter at 6:22 AM on May 13, 2016 [2 favorites]


Ironically, the vox.com article in the OP contains an ad for OKCupid. Not the best move there, vox.
posted by tallmiddleagedgeek at 6:35 AM on May 13, 2016


Why hasn't OKC put measures in place to rate limit bots and scrapers?

Two points off the top of my head: 70,000 profiles isn't really that many. Even if you rate limit per-IP requests to 1000/day (or something arbitrary, as it often is), that's still only a bit over two months. And scrapers are fire-and-forget. The other thing is that writing scrapers to collect data from multiple source IPs is fairly easy these days, so you can reduce the time required by another multiple.
posted by iffthen at 6:55 AM on May 13, 2016 [1 favorite]


Just want to add to the previous comment that in this scraping scenario, the more likely limiting factor for requests is the user ID/auth token, not the IP address. But circumventing per-user limits is arguably easier than per-IP limits, since they're not often tied to each other.
posted by iffthen at 7:20 AM on May 13, 2016


Just wait till "open science" users start releasing crappily-engineered organisms because who cares?
posted by sneebler at 7:23 AM on May 13, 2016 [2 favorites]


The ethics training that researchers are required to take is just ludicrously bad (it's all animal models and IRBs, and the last time I did the training there were literally zero people in the room who were doing research that required either of those), so I'm not really that surprised that this happened. It was only a matter of time, really.
posted by tobascodagama at 8:15 AM on May 13, 2016 [2 favorites]


Keyes actually points that out in his analysis - he argues that the AOL search results incident should be added to ethics courses.
posted by NoxAeternum at 8:37 AM on May 13, 2016 [1 favorite]


I will absolutely back that up, even as one of the people responsible for researcher ethics training (though in a different area than this debacle touches.) My office is historically understaffed (although that's improving under new leadership), with very few resources to develop our own training so we're largely stuck with what's out there - e.g., CITI training. And what's out there doesn't even begun to touch the kinds of issues that are really needed for the kind of work our researchers do.

Plus, when we do hold in-person (and livestreamed) trainings so we can do stuff that's more personalized to this university's needs and the kind of work done here, it's pulling teeth to get anyone to come unless they're required to as part of NSF funding. And when we ask departments to let us come do department-level trainings so we can come to the reseachers instead of making them come to us, and offer to do any length training on any subject the faculty want, the departments generally say "nope, our researchers are too busy and not interested in this kind of thing anyway, we can't make time for you." Our best bet is to do one-on-one work with individual researchers, learning about their work and walking them through what they need to know about doing that kind of work. That's really successful for us and for the researchers, but it's not feasible to do that with everyone and the people who most need it wouldn't accept it anyway.

I'd like to think we're unusual and other universities are doing better on the research ethics training front, but I rather doubt it.

(Plus, ask me sometime about how hard my university has been trying to work on the government bodies responsible for research regulations to make them understand *how the internet works* and what kinds of research people want to do with it. They are years behind, the regulations they're trying to make utterly fail to take into account the kinds of research that is now possible, and it's just...really, really depressing. So it's not like we or our researchers can look at government guidance on this stuff because it's nonexistent or hideously out of date.)
posted by Stacey at 8:40 AM on May 13, 2016 [4 favorites]


And that's why the OSF's response is so troubling - instead of taking a "no non-anonymized data sets" policy, they're relying on the researchers to operate properly.
posted by NoxAeternum at 8:46 AM on May 13, 2016 [4 favorites]


Full disclosure: I've collaborated with the Center for Open Science, which maintains the OSF, on a separate project.

Although the actual data shared is an obviously egregious violation of any reasonable (research) ethical principles, it seems unfair to blame the OSF for not immediately accepting responsibility. OSF isn't some special club that you have to be an "open science researcher" to join--accounts are free and open to anyone. The platform is designed to encourage researchers to share their data, code, etc., and has special functionality for that, but you could be using it to back up your family photos for all anyone knows. It's like Dropbox.

Given that, a (somewhat) hands-off approach to what content is posted may be a necessity if they don't want to hire full-time staff to make judgment calls about what's acceptable and deal with the associated arguments (plus the much greater implied approval of anything that does get posted). Look how much work moderation is at Metafilter! My impression from their statement is just that they really don't want to set a precedent that it's their job to do that sort of moderation--or at least, that they don't want to set that precedent right now, under media pressure rather than more measured consideration.
posted by cogitron at 9:20 AM on May 13, 2016 [2 favorites]


This is just a neo-fascist using whatever platform he can to spread his hateful bile. I'm not sure it really says much of anything about open science as a whole.
posted by [expletive deleted] at 9:48 AM on May 13, 2016 [1 favorite]


Given that, a (somewhat) hands-off approach to what content is posted may be a necessity if they don't want to hire full-time staff to make judgment calls about what's acceptable and deal with the associated arguments (plus the much greater implied approval of anything that does get posted). Look how much work moderation is at Metafilter! My impression from their statement is just that they really don't want to set a precedent that it's their job to do that sort of moderation--or at least, that they don't want to set that precedent right now, under media pressure rather than more measured consideration.

If they want to be considered a respectable member of the scientific community, then yes, it very much is their job to moderate the content that they are hosting, making sure that it meets the rather low bar of not allowing data sets with glaringly massive ethical issues to be made available to the public through their system! It is absolutely fair to blame the OSF for not having any sort of gameplan in place for dealing with abuse of their systems and with bad actors. And the reason that they are now facing pressure to do that sort of moderation is because they pushed the question off, and it blew up in their faces. If they want to be taken seriously in the scientific community, they need to actually uphold performing research ethically, and to actually deal with abuse.
posted by NoxAeternum at 9:57 AM on May 13, 2016 [7 favorites]


I liked Oliver Keyes's guideline: "data science is like vampirism: if you see a threshold, ask before crossing it."
posted by Lexica at 10:56 AM on May 13, 2016 [16 favorites]


Yup. I'm considering printing that line out and hanging it up in my office.
posted by Stacey at 11:00 AM on May 13, 2016


This is pretty skeevy on a number of levels.

I wonder if it is even possible to anonymize this data on a user by user basis. Replacing usernames with hashes is not a solution. An individual's responses are reasonably likely to be uniqish enough to unmask them. One could substitute codes for questions/responses but then why post the data at all? Aggregating it might work, but only if done carefully.

I think there is a fundamental problem. Open data needs to be raw enough that it can be used in explanatory models, yet not so raw that it provides enough information to identify a person. However, as models improve, it's likely that the amount of information needed to identify a person decreases. That means that data sets that previously had a low risk of reindentification might have much higher risk even a short time later.

In many cases, the solutions suggested are farcical. For example, in Oliver Keyes' article, he opines that usernames were not replaced by hashes. This is simply not a good idea. As Vijay Pandurangan points out in his article about the NYC Taxi dataset:
A cryptographically secure hashing function, like MD5 is a one-way function: it always turns the same input to the same output, but given the output, it’s pretty hard to figure out what the input was as long as you don’t know anything about what the input might look like. This is mostly what you’d like out of an anonymization function. The problem, however, is that in this case we know a lot about what the inputs look like.
As of last year, OKCupid usernames averaged 10.5 characters long. 42% of them included at least part of the user's real name. Hashing is no longer considered a safe way to store passwords because it is so easy to reverse. Why should we think it would protect a participants identity?

Open Science that involves studying people is a thorny mess even without bigots trawling OKCupid. I think this is especially true regarding guarantees of anonymity to participants. At best they need to be heavily qualified. When deciding whether to make a dataset open, it seems reasonable to take a hard look at a dataset with the expectation that at some point in the future it will be deanonymized. If the consequences for the participants seem acceptable; Yay Open Science! Otherwise...
posted by ethansr at 11:03 AM on May 13, 2016 [4 favorites]


The data isn't gone, because data is never gone from the internet. It's now just harder to find. Which means I can't tell if I or my friends are on this list, but others can.

There's a text file of just the usernames linked in one of the reddit threads, if you want to find out if your profile was affected. Mine was not, but a friend of mine's was. Her username is shared with her gmail account, which makes it trivial to tie her profile to her public identity. This is going to hurt a lot of people who have good reasons to keep their profiles pseudonymous. Absolutely infuriating how cavalier this guy is being.
posted by hyperbolic at 12:19 PM on May 13, 2016 [4 favorites]


On the subject of proper training, I'd like to say that the US government actually has pretty good training on the subject of handling personally identifiable information. As a contractor working with anonymized biometric data, I received training that made it clear that even anonymized data has to be secured and treated with care because it can often be combined with other, publically available data to deanonymize the subjects.
posted by hyperbolic at 12:25 PM on May 13, 2016 [1 favorite]


From what I can tell the "researcher" is trying to hurt these people and just playing dumb
posted by Elementary Penguin at 12:26 PM on May 13, 2016 [7 favorites]


There's a text file of just the usernames linked in one of the reddit threads, if you want to find out if your profile was affected.

Could you link to that text file or its reddit thread?
posted by cosmic.osmo at 12:38 PM on May 13, 2016


here
posted by naju at 12:53 PM on May 13, 2016


They're scraping images? What the fuck?
posted by Artw at 1:04 PM on May 13, 2016


From what I can tell the "researcher" is trying to hurt these people and just playing dumb

Which is one good reason why OKCupid should have anonymised their data set before handing it over to researchers in the first place.
posted by tobascodagama at 1:07 PM on May 13, 2016


My understanding is that this guy just scraped the website without OKCupid's knowledge.
posted by Elementary Penguin at 1:09 PM on May 13, 2016 [1 favorite]


That's only in the first two sentences of the OP; obviously nobody could have been expected to read those.
posted by Shmuel510 at 1:15 PM on May 13, 2016 [5 favorites]


Jesus, that's even worse, that's like web security 101 shit.
posted by tobascodagama at 1:15 PM on May 13, 2016


It should go without saying that Kirkegaard is wrong about this data being public and freely available. It's proprietary data, and his use and dissemination of it violates multiple parts of OKCupid's TOS:
So long as you comply with these Terms of Use, you are authorized to access, use and make a limited number of copies of information and materials available on this Website only for purposes of your personal use in order to learn more about Humor Rainbow or its products and services, or to otherwise communicate with Humor Rainbow or utilize its services. Any copies made by you must retain without modification any and all copyright notices and other proprietary marks. The pages and content on this Website may not be copied, distributed, modified, published, or transmitted in any other manner, including use for creative work or to sell or promote other products. Violation of this restriction may result in infringement of intellectual property and contractual rights of Humor Rainbow or third parties which is prohibited by law and could result in substantial civil and criminal penalties.
...
Illegal and/or unauthorized uses of the Website, including collecting usernames and/or email addresses by electronic or other means for the purpose of sending unsolicited email or using personal identifying information for commercial purposes, linking to the Website, or unauthorized framing may be investigated and appropriate legal action will be taken, including without limitation, civil, criminal, and injunctive redress.
posted by naju at 1:22 PM on May 13, 2016 [4 favorites]


So - OKCupid's lawyers better be all over this. In addition to that, from a legal standpoint, Open Science Framework can't knowingly host infringing content. They don't necessarily have to actively moderate everything they host, but they do need to take steps to remove obvious infringement when they learn about it. "Turning a blind eye" isn't an option.
posted by naju at 1:35 PM on May 13, 2016 [4 favorites]


They're scraping images? What the fuck?

No, but only because he didn't have the storage to do so.
posted by NoxAeternum at 1:48 PM on May 13, 2016


Of course, there's not much stopping anyone with basic programming skills from running with the dataset and scraping images on all the usernames.
posted by naju at 1:51 PM on May 13, 2016


OKCupid's web admins sure as shit should be stopping them from doing that. Aggressive scraping like that is basically indistinguishable from a DoS attack.
posted by tobascodagama at 2:02 PM on May 13, 2016 [1 favorite]


I was pleased to find out that my profile was not scraped, but then I realized that this is yet another way that no one is interested in my OKC profile.
posted by zeusianfog at 2:13 PM on May 13, 2016 [14 favorites]


"I'd rather you just untag me from ethics/IRB stuff. :)"

It's like Donald Trump and Mark Zuckerberg had a Danish love child.
posted by wonton endangerment at 3:25 PM on May 13, 2016 [2 favorites]


tobascodagama: "Aggressive scraping like that is basically indistinguishable from a DoS attack."

As pointed out above, there were roughly 70,000 profiles in this data set which could easily be acquired at rates far, far below DDoS attack rates.
posted by mhum at 4:10 PM on May 13, 2016 [3 favorites]


via Oliver Keyes update, open letter (Google doc) aimed at Århus U. et al.
posted by wonton endangerment at 4:22 PM on May 13, 2016


It's like Martin Shrkeli pulled some evil frankenstein shit with the late Aaron Swartz, man this guy is an asshole
posted by oceanjesse at 4:31 PM on May 13, 2016 [1 favorite]


I think the behaviour of the "researcher" here was pretty awful. but maybe not as bad as the responsibility of the "concerned bloggers."

I deliberatley didn't post about this here or elsewhere a few days ago because the data set was still publicly available. They blew up a paper that would otherwise have passed unnoticed as an uncited article in a vanity journal and instead made it "the" story in the data nerd twittosphere for several days - all while the data set was still publicly available and all with helpful information included very close by about exactly how to access it. If I was in the data dump I would be much more scared now about the volume and motivations of the people with access to my data now than before when it was just another dying file. There must be better ways to deal with breaches of sensitive information even across countries than viral blog/twitter posts.
posted by Another Fine Product From The Nonsense Factory at 4:47 PM on May 13, 2016


That's a fair point, but it also sounds like the Open Science Framework folks weren't at all interested in talking about it quietly, which could have had gotten the data dump taken down before there was this much attention on the subject. I'm not sure there's really a great option; you can't expect people to silently ignore the situation, but drawing attention to it, well, draws attention to it.
posted by zachlipton at 5:53 PM on May 13, 2016 [3 favorites]


That's a fair point, but it also sounds like the Open Science Framework folks weren't at all interested in talking about it quietly, which could have had gotten the data dump taken down before there was this much attention on the subject.

It is hard to tell from the information presented.

I freely admit to wanting Nosek & co. not to turn out to be amoral jerks, but granting that bias the quote that sounds like "Oh, we'll take it down if the owner asks" could also mean"Oh shit, how did this happen, what's the quickest way to remove it, I got it, OK Cupid should tell us to remove and it will be gone instantly."

This is such a bad thing that it seems easy not that have thought through. I could easily imagine professional scientists used to a world of IRBs and professionals and 6+ month publication periods could not realize that a student research project would put them in a position where they needed procedures that could let them remove stuff instantly, as opposed to after a complaint and investigation.
posted by mark k at 7:06 PM on May 13, 2016


heh wired did an article about this
posted by talaitha at 3:00 PM on May 15, 2016 [1 favorite]


Your privacy on OkCupid: the unromantic truth:
Summary of our findings on OkCupid privacy:

– OkCupid.com uses multiple web trackers, ad networks, and tracking cookies

– It shares your information with a large network of advertisers and partners

– Anything you post may be stored forever

– Match.com, IAC, and all of IAC’s properties may now access all of OkCupid’s user information
posted by cosmic.osmo at 11:42 AM on May 16, 2016


There Is No Such Thing as “Public” Data:

I hear arguments like this all the time. Websites that post mug shot photos to shame people say they’re just using public records. Harassers who take “upskirt” photos of women say they are blameless because their activities occurred “in public.” Police say they are free to use powerful technologies to surveil anyone for as long as they like as long as they are “in public.”

This justification is fundamentally wrong. Not just because we should be able to expect a certain amount of privacy in public, but because, despite frequency of use and seeming self-evidence, we actually don’t even know what the term public even means. It has no set definition in privacy law or policy. I often ask people to define the term for me. Common responses include “where anyone can see you” or “government records.” But by far the most common response I get is “not private.” Fair enough. But thinking of publicness this way only leads us to the equally difficult question of defining privacy.

Frankly, this argument is dangerous. People are wielding the notion of publicness as a sort of trump-all-rebuttals talisman to justify privacy invasions. By itself, this concept of publicness has no exculpatory power. How could it? We can’t even define it. We should be more critical of appeals to the publicness of data to justify its collection, use, and disclosure.

posted by NoxAeternum at 7:51 AM on May 20, 2016 [1 favorite]


« Older Throw me in a haunted wheelbarrow and set me on...   |   With a name like that, of course he became a game... Newer »


This thread has been archived and is closed to new comments