Internet journalism and invasive surveillance
May 29, 2015 9:30 AM   Subscribe

The real question is, what can advertisers tell from our metafilter history?
posted by The Devil Tesla at 9:31 AM on May 29, 2015

We have cameras?
posted by maryr at 9:44 AM on May 29, 2015 [9 favorites]

But this isn’t the Holy Grail of my surveillance capability. What I’d do next is: create a world for you to inhabit that doesn’t reflect your taste, but over time, creates it. I could slowly massage the ad messages you see, and in many cases, even the content, and predictably and reliably remake your worldview. I could nudge you, by the thousands or the millions, into being just a little bit different, again and again and again. I could automate testing systems of tastemaking against each other, A/B test tastemaking over time, and iterate, building an ever-more perfect machine of opinion shaping.

Hyperbole aside, this is the real issue, and not because I'm worried about how my behavior is modified (I consider myself highly inner-directed), but because of how our culture is modified.
posted by Ickster at 9:46 AM on May 29, 2015 [3 favorites]

maryr: "We have cameras?"

Fiiiiiinnnnnnnnnee. I'll put my pants back on....
posted by Samizdata at 9:49 AM on May 29, 2015 [2 favorites]

We have cameras?

Of course not.

Hey! Is that a new shirt?
posted by Thorzdad at 9:55 AM on May 29, 2015

Adblock and Privacy Badger, the latter by the Electronic Frontier Foundation, help one opt out of this whole mess.

Privacy badger btw tells me that this Mefi page has a (blocked) quantserve tracking cookie somewhere on it that tracks my browsing history between different websites
posted by AGameOfMoans at 9:58 AM on May 29, 2015 [4 favorites]

The pisser is, of course, with websites which refuse to work unless you un-block tracking and ad scripts. I've landed on sites that wouldn't deliver content until things like Google Analytics scripts are re-enabled.
posted by Thorzdad at 10:06 AM on May 29, 2015 [3 favorites]

Maybe once a week... maybe I run across a site that I actually need or want to look at that is refusing to run. For that I have a second unblocked browser in which I open a private window - see what I need to see and then quit the browser which deletes all cookies. It is a very minor inconvenience when balanced against me being spied upon.
posted by AGameOfMoans at 10:16 AM on May 29, 2015 [1 favorite]

The thing about this piece is that it doesn't just apply to journalists. It basically applies to just about anybody who works on making any piece of the web tick. The cross-site analytics, the metrics, the data-hoarding are everywhere in e-commerce, any kind of content-oriented undertaking, social anything - it's essentially universal, and the pressure on technical / marketing / sales / customer service / management people to enable it is both overwhelming and the easiest thing in the world to capitulate to.

Last year I quit a job that I deeply loved, at a place I'd poured years of my life into, partially in explicit protest of this kind of behavior. Lately I work for people who, though conscientious and considerably more self-aware about their own marketing practices and so forth, rely on just as much of the enabling infrastructure. You can refuse to support the machinery or you can participate in this economy, but it's brutally difficult to do both.
posted by brennen at 10:30 AM on May 29, 2015 [8 favorites]

This article is compelling, but it's still not spelling out step-by-step _why_ regular people should worry that this information is being stored and used. Yes, no real security for political activists (her newest article on that site), yes, maybe "predictably and reliably remake your worldview", but that's not really spelling it out. How do these become _real_ for people?

I guess what is needed is a compelling and imaginative story, possibly non-fiction, possibly fiction, that makes it clear how step 1 leads, very probably, to step 4 (unhappiness and/or worse life for you and your children). Is anybody writing _that_?
posted by amtho at 10:41 AM on May 29, 2015

I ended up killing an internal project that I was really excited about in part because I had no idea how to keep it from becoming this kind of mess once it had shipped and was in the hands of random, unprincipled strangers. I feel really fortunate that it was peripheral to my primary work, and I had the luxury of axing it without having to quit my profession and wait tables. It's a really tangly, frustrating system.
posted by verb at 10:41 AM on May 29, 2015 [12 favorites]

The idea of shaping a person by shaping their browsing was a minor plot point in Stross' book Rule 34. In his case an algorithm was looking for potential terrorists and trying to reshape them into nonterrorists.
posted by sotonohito at 11:14 AM on May 29, 2015 [2 favorites]

Perhaps this is false nostalgia, but it still bugs me that we seem to have all the building blocks for making computing safe for average people but we seem to lack a critical mass to push them together into formal projects.

I am not sure that we have the building blocks. Or I guess maybe I'm not sure that the blocks we do have can presently be arranged in a way that overcomes the vast array of perverse incentives and pathologies (both accidental and borne of deliberate malice) driving most of the people capable of arranging them.

We have a lot of what it'd take to build a safe(r) network, but the architecture of just about every existing system is pretty bad on security and pretty susceptible to the cumulative elimination of privacy. I mean, what do we do about the fact that the tech driving a lot of our problems here is something as fundamental as the relational database?
posted by brennen at 11:18 AM on May 29, 2015

Why would we want to do anything about that, brennen?
posted by LogicalDash at 11:21 AM on May 29, 2015

Why would we want to do anything about that, brennen?

I broadly agree with Norton's conclusions about privacy and surveillance, and I think that a lot of the problem falls out of fundamental tech that's become really basic to our economy (like the RDBMS) over the last 20-30 years, and that this makes the problem harder to solve than it would be if we were just talking about the economic incentives around a narrow slice of web tech in isolation.
posted by brennen at 11:29 AM on May 29, 2015 [1 favorite]

A Big Data Breakup Album
posted by almostmanda at 11:41 AM on May 29, 2015

I guess I don't see why the RDBMS per se is part of the problem. I guess it does make it easier to process surveillance data in the same sense it makes it easy to process large amounts of data, generally.
posted by LogicalDash at 11:44 AM on May 29, 2015

RDBMS's generally apply security at the entity level rather than row level. To access any SSN, you must be able to access all SSNs.
posted by blue_beetle at 11:58 AM on May 29, 2015 [1 favorite]

Lately I work for people who, though conscientious and considerably more self-aware about their own marketing practices and so forth, rely on just as much of the enabling infrastructure. You can refuse to support the machinery or you can participate in this economy, but it's brutally difficult to do both.

@pmarca retweets...
-"98% of people who have ever paid for @SlackHQ are still paying for it." ~ @stewart
-"We're doing well. We have $300m in the bank & we've been growing 5%/week for 70 straight weeks." Stewart Butterfield, CEO of Slack [mefi's own #1 employer]

In the startup world, you work hard and you move fast in order to make other people rich.

Other people. Not you.

You're a small elite of very smart young people who are working very hard for an even smaller elite of mostly Baby Boomer financiers so they can buy national governments, shut the governments down, destroy the middle class and the nation-state.

That's been going on a long time. It's not something you invented; that's a historical development. There's a lot of reasons that the nation-state's got to go. There's a lot of reasons why a middle class is in the way.

But that's what you do. That will be the judgment of history for your startup culture. They're going to say that the twenty-teens were all about that:

“It was a tacit allegiance between the hackerspace favelas of the startups and off-shored capital & tax avoidance money laundries. And what were they doing? They were building a globalized network society.”

And that's what's coming next: an actual globalized network society. You're routing around it from the bottom while they climb over it from the top, but you're both aimed in the same direction. That's why you're in tacit allegiance, whether you know it or not.

And right now everybody lives the way that people used to live under empires in colonial states. We're all auto-colonialized by the austerity. That's your big dragon. That's your actual dragon. Not, like, the little tactical dragon. That's the BIG DRAGON. And you know it's the big dragon because you're part of it. You're actually its brain and its nervous system.


And as long as you are making rich guys richer, you are not disrupting the austerity. You are one of its top facilitators.
I guess what is needed is a compelling and imaginative story, possibly non-fiction, possibly fiction, that makes it clear how step 1 leads, very probably, to step 4 (unhappiness and/or worse life for you and your children). Is anybody writing _that_?

this is water by ramez naam (via)
posted by kliuless at 12:00 PM on May 29, 2015 [27 favorites]

RDBMS's generally apply security at the entity level rather than row level. To access any SSN, you must be able to access all SSNs.

In all the databases I've used you can at least apply different permissions to one table. The table in question might actually be a view. That might be the only way that some particular third party can access your SSN data.

What I'm saying is I don't see the relevance.
posted by LogicalDash at 12:25 PM on May 29, 2015

this is water by ramez naam (via)

Wow. Thanks for that, kliuless.
posted by straight at 12:28 PM on May 29, 2015 [2 favorites]

What I'm saying is I don't see the relevance.

Take a hypothetical but fairly typical mid-sized web retailer, customer base in the hundreds of thousands, annual revenue somewhere $10-50 million, in business for somewhere around a decade, running a hacked-up derivative of some common shopping cart stack. Assuming a not-too-unusual amount of continuity in the way this business operates on the web, figure 8-10 years worth of customer account and order data, stashed in MySQL or PostgreSQL: A bunch of logins, a bunch of e-mail addresses, a bunch of mailing addresses and line items and payment records.

My point is basically: Look at all that data and its unintended consequences. Databases (forget the specifics of the relational bit, I'm talking about big collections of data you can query for purposes unanticipated by the people originally storing the data) usually don't exist in the first place to facilitate surveillance as such. The people working for our hypothetical web store aren't malicious or creepy. They just want to ship packages full of things to people who paid for them, handle customer service calls, know what's in their inventory, pay taxes, pass audits, etc. etc.

And yet: All of that data can be correlated to other data, and piped to / correlated by third party services (analytics, e-mail marketing, re-targetting of ads on other sites, fraud prevention, identifiers of users on common social networks, whatever deeply disturbing ways your cell phone provider is selling you out) where it can leak into other databases. And then before you know it x-random vast corporate entity has a pretty good working model of your interactions with all sorts of little web stores and journalistic outlets and not-so-little social networks and governments and so on down the line.

Friendly little web store's database is sort of like the basic problem with dragnet surveillance, writ small: Whatever the rationale for its creation, the possibilities for its eventual use are combinatory and nearly impossible to constrain as long as the data exists and is relatively well-formed.

Databases aren't inherently bad. They're these amazingly useful tools, and probably an inevitable use of computers with long-term storage, just like diaries and ledgerbooks and file cabinets are inevitable uses of paper and ink. If SQL and friends didn't exist, it would be necessary to invent something like them. And yet they're part and parcel of all the ways that we're kind of fucked right now. Which is a big part of why this is a hard problem.

(See also: HTTP cookies, web request logging, credit-card transactions, client-side scripting, location services on mobile devices...)
posted by brennen at 12:56 PM on May 29, 2015 [7 favorites]

I have a confession to make: part of my job, over the last few years, has been working with some of the companies doing exactly this kind of thing. If you hear about all this stuff and think to yourself, wow, this is an industry full of creepy soulless marketing drones, you'd be partially right. Those folks are definitely here, and a lot of other folks who came in with the best of intentions are slowly morphing into that. It's one of the reasons why I'm desperately seeking another line of work.

I feel like I should make one distinction that isn't necessarily clear here, though, and that's the difference between web analytics itself and the growing marketing panopticon that aggregates as much personal data as possible in the efforts to predict and ultimately shape consumer behavior. The one is a tool in service of the other, but there are plenty of good reasons for web analytics that don't necessarily involve Big Brotherish impositions on personal privacy. Metafilter (for example) is using Google Analytics, as are many thousands of other sites from personal blogs to multinational corporation homepages, and while you can do creepy things with GA, all it is at heart -- all most pure-play web analytics tools are, at heart -- is a way to understand how anonymous visitors, in aggregate, are interacting with your site.

Here's an example. I'm right now in the middle of implementing a web analytics solution for a San Francisco-based company who wants to understand how effective their website and their advertising efforts are. So we're tagging their site and their inbound links, so when visitors click a link in a marketing email, or click on a paid search result at the top of the Google search results, we can make a note of that and compare the behavior of that group of visitors to others who might have browsed to the website directly, or got there through Bing, or whatever. (I'm kidding, nobody got there through Bing.) The idea is to help the marketing folks figure out where to spend their time and budget. Why spend $10k on a big email campaign if only 10% of recipients click through to the website, and a tiny percentage of those folks make a purchase or sign up for something, when you could spend that $10k on your Google AdWords budget and maybe get more return on your investment?

Those are the kinds of questions most of my current clients are looking to answer, and I have no problem helping them with that. At its core, web analytics is about identifying why your website or app exists, what it is you want your visitors to do (buy a book, download a whitepaper, sign a petition, watch a video), and identifying the most interesting segments among your visitors (where they were before they came to your site, what page they landed on, what marketing campaigns if any they were exposed to, what part of the world they live in) and putting those two bits of data together to make interesting reports that can help you build a better site or app. If people from Asia tend to hit one or two pages on your site and never go further, maybe your code that detects visitor country and switches to Chinese or Korean is broken. If people who downloaded a whitepaper in the last month are more likely to come back and sign up for your service, maybe you should promote those whitepapers more prominently on your home page.

And then once in a while I work with folks who smile and nod and say "right, I want all of that, but we need more -- I don't want to understand visitor behavior in aggregate, I want names and email addresses." And then things start to get creepy. Sometimes it's "I want to know who abandoned a shopping cart so I can email them directly and ask them why, or offer them a coupon for that item," which isn't so bad in theory even though it relies on building a huge database of specifically-identified individuals and associating their web activity with it. Other times it's more like "We work with a number of partners to aggregate visitor data across multiple channels -- mobile, social, whatever offline data we can scrape together -- and we need as much detail on visitors to this site as we can get so we can tie it together with the rest of our data," and surprise, now you're working for the panopticon.

When my clients start talking about personally identifiable information, I tell them it's not what I do. Anonymous visitor behavior in aggregate, sure, I can do that all day, but I won't help you collect personally identifiable details so you can tie it into your all-encompassing marketing database and behavior analysis model. But that doesn't mean companies don't get the data; they just don't get it from me. Thousands of other consultants and marketing agencies and tech companies of all stripes are standing right behind me ready to deliver all of that and more for the right price. Hell, the companies who build the tools I use are right there with them, and they'll sometimes give me dirty looks for not promoting the rest of their service portfolio.

It's obviously not as spectacularly destructive, but in its own way, the broad and insidious aggregation and integration of consumer marketing data is a little like splitting the atom. Once it became clear that we could collect this data, it was only a matter of time before somebody started doing it, and once that somebody started making money, it didn't take long for us to get from one or two companies to a digital marketing landscape that (as of January 2015) looks like this. Now we have thousands of companies, more every day, competing with each other for who can provide marketers the clearest and most comprehensive data on how to sell more shit to their customers, from the highest levels of abstraction all the way down to uncomfortably narrow segments like 'email addresses of men in their 40s who live in Portland Oregon, have visited their doctor in the last year, and have one or more prescriptions for erectile dysfunction'.

It's too much for me; I'm wrapping up my current projects and checking out. If you hear of anyone looking for a reformed web analytics guru, let me know. But with or without me this huge katamari is going to rumble onward, picking up more and more personal data as it goes, and I don't think there's really any way to stop it at this point.
posted by Two unicycles and some duct tape at 1:40 PM on May 29, 2015 [17 favorites]

forgot to mention that Cathy O'Neil is writing a book -- partially as a result of mefi's own Jordan Ellenberg (escabeche) -- called Weapons of Math Destruction about: "how big data is being used as a tool against the poor, against minorities, against the mentally ill, or against public school teachers... on the ways big data increase inequality and further alienate already alienated populations."*

also btw...
The Guy Who Worked For Money by Benjamin Rosenbaum
posted by kliuless at 2:56 PM on May 29, 2015 [5 favorites]

The thought occurred to me just now that what if you made a plug in that did not just block tracking info ... but instead scrambled it in such a way that the tracking agencies would end up with this huge db of junk data .... hmmmm.....
posted by AGameOfMoans at 3:17 PM on May 29, 2015 [1 favorite]

The thought occurred to me just now that what if you made a plug in that did not just block tracking info ... but instead scrambled it in such a way that the tracking agencies would end up with this huge db of junk data .... hmmmm.....

Not a terrible idea. The issue is that the data already isn't clean, and they still find ways to track people. There's only so much you can do on an Internet you don't own.
posted by The Devil Tesla at 3:58 PM on May 29, 2015

You know what threw me for a loop was seeing ads related to [traditionally very private mental health topic that affects me] browsing the web at work. There's no mystery as to how they could do this - I was actually logged into the same Google account in both locations - but what the fuck?

On the other hand the fact that the vast majority of ads that, say, Youtube wants to show me are for audio gear and software - a correct identification of my hobbies - is something I'm perfectly happy with, in itself. That's the kind of mutually satisfactory example ad people use to defend their trade. Problem is the underlying techniques are way too powerful and pervasive to trust that they will only be used in benign ways.
posted by atoxyl at 5:53 PM on May 29, 2015 [1 favorite]

Another problem is that the people who are doing a lot of it are, because they are human, also at times extraordinarily incompetent and blinkered. The tools are powerful but they're being used on the wrong things and provide unintended consequences.

I take heart because the ads I'm shown so consistently don't fit me despite my generally cavalier attitude toward money, catholic tastes, and willingness to buy online. Either I already bought it or there is no way in hell I'd buy it.
posted by Peach at 7:21 PM on May 29, 2015 [1 favorite]

Related: Twitter is collecting a list of the apps you have installed on your phone (someone caught them last year doing this, but now they have a FAQ about it).
posted by RobotVoodooPower at 7:45 PM on May 29, 2015 [2 favorites]

Yeah as far as my experience goes with this stuff server side most people want aggregate optimize-our-service performance and utility data, and treat PII (personally identifying information) as a kind of toxic waste they desperately want to avoid dealing with.

Sites usually don't want PII any more than they want your passwords or credit card on file: it makes them liable, very liable, for misbehaving employees and security breaches. They need to build and pay for a huge army of support staff for dealing with individuals, rather than computer sanitized aggregate demand, aggregate interest, aggregate money.

Everyone wants the aggregate info, and to target aggregate demographics with ads, and to receive aggregate money from aggregate buyers. Virtually nobody wants personal details of users. Handful of properties. You can tell because they use terms like "identity" and provide login/authentication service.
posted by ead at 11:42 AM on May 30, 2015

(And, of course, the handful of sneaky ad companies that turn identity trackers into aggregate demographics. But that's the thing: they're selling a service that analyzes, categorizes, aggregates and blinds the customer to PII, because that's what the customer actually wants.)
posted by ead at 11:44 AM on May 30, 2015

I really disagree that "nobody wants personal details of users". In my experience, plenty of organizations do, and are gathering it already. They just haven't really figured out how to use it yet. It's being worked on though. Nieman lab published a pretty compelling/revolting vision of what this might look like just yesterday.
posted by spudsilo at 12:23 PM on May 30, 2015 [1 favorite]

That article is a great example of what I'm talking about. The hypothetical NPR program or NYT article the device is delivering to the user is being selected by an identity agnostic, dimension reduced model of user interest. NYT isn't going to nefariously catalogue the fact that John Smith is browsing want ads again and he's the sort of person who likes stories about British sitcoms while eating toast but not in the shower. Because they don't want to know, it's too much detail and not safe for them to handle. Some information about identity may leak through to NYT but primarily they're going to negotiate with an aggregate traffic source / user identity modeling and clustering 3rd party (Apple, ad networks, etc.) for higher quality targeting of their outgoing articles, higher quality matching of articles to ads, ads to abstract/latent user dimensions, or articles to paying subscribers, some metrics like a certain number of seconds of confirmed attention or interactions per billion licensed snippet deliveries, etc.
posted by ead at 2:21 PM on May 30, 2015

(Of course it depends on the site; if they're in the business of cultivating a strong sense of individual user identity -- Twitter is a fine example -- they will do more of this. But in general cultivating personal relationships with users is super expensive and dangerous, and well outside most sites' reach. Hence all the login with Facebook, pay with Stripe, monitor traffic with Google stuff. People outsource contact with user identities to protect themselves, limit costs.)
posted by ead at 2:27 PM on May 30, 2015

I guess what is needed is a compelling and imaginative story, possibly non-fiction, possibly fiction, that makes it clear how step 1 leads, very probably, to step 4 (unhappiness and/or worse life for you and your children). Is anybody writing _that_?

I don't know if this meets your threshold, but from this report the soon-to-be-grandfather seemed pretty upset.

No word on how the mother-to-be reacted to all this, but I'm having a hard time imagining it was a purely pleasant experience to have the surveillance-industrial complex breaking this news to your family.

Also, mind, this was before the massive data breach at the retailer in question, which brought about the resignation of its CEO.

So, the situation for private industry is very much like that of the military-police-surveillance complex: Not only are they collecting and using this data in squicky ways, they can't seem to keep it from malefactors (or whistleblowers they prefer to cast as malefactors).
posted by one weird trick at 7:32 PM on June 2, 2015

« Older Less Marc Jacobs More Jane Jacobs   |   SoX -- Surf Newer »

This thread has been archived and is closed to new comments