This Person Exists
December 9, 2020 1:37 AM Subscribe

A website showing the real faces of real people used to train This Person Does Not Exist, without their knowledge or consent.

As large tech companies are increasingly deploying machine learning in consumer-facing systems, the real labour of building datasets is often invisible — consumers and internet users have their data added to large datasets, which are then cleaned and tagged by low-paid workers who are typically not talked about or acknowledged. As Anatomy of an AI System puts it:

[T]he user has purchased a consumer device for which they receive a set of convenient affordances. But they are also a resource, as their voice commands are collected, analyzed and retained for the purposes of building an ever-larger corpus of human voices and instructions. And they provide labor, as they continually perform the valuable service of contributing feedback mechanisms regarding the accuracy, usefulness, and overall quality of Alexa’s replies. They are, in essence, helping to train the neural networks within Amazon’s infrastructural stack.
[…]
This kind of invisible, hidden labor, outsourced or crowdsourced, hidden behind interfaces and camouflaged within algorithmic processes is now commonplace, particularly in the process of tagging and labeling thousands of hours of digital archives for the sake of feeding the neural networks. Sometimes this labor is entirely unpaid, as in the case of the Google’s reCAPTCHA. In a paradox that many of us have experienced, in order to prove that you are not artificial agent, you are forced to train Google’s image recognition AI system for free, by selecting multiple boxes that contain street numbers, or cars, or houses.

Perhaps one of the first instances of someone making their way into a consumer-facing machine learning system unbeknownst to them was Susan Bennett, the voice actor who discovered that she was the voice of Siri after the iPhone 4S was released, and colleague of hers reached out to ask if she was Siri.

previously on MeFi:

This Person Does Not Exist
if there’s a picture, the person obviously exists.... right?
The Amazon Echo as a map of human labor, data and planetary resources
Finding Siri

posted by wesleyac (32 comments total) 24 users marked this as a favorite

The tech industry has a very large free labor problem, which they don't want to face because doing so would mean actually paying for that labor. Hence why you see over and over these systems designed to extract labor for free without the public noticing.
posted by NoxAeternum at 1:47 AM on December 9, 2020 [20 favorites]

And also, to use buzzwords like InstaHide to extract money from businesses that try to make it look a little less unethical.

The idea behind this technology is to somehow scramble the original images to make it impossible to derive original images from the rules, failing in several embarassing ways.

"This implementation uses a standard psudorandom number generator (PRNG) to create the random sign flipping mask. It turns out that PRNGs have the property that, given a few outputs, it's possible to learn their state and as a result can recover every subsequent output. Learning the state requires a little bit of work, but it's not all that hard (it takes roughly 100 CPU hours, at a cost of about $4 USD). So we did it."
posted by flamewise at 2:06 AM on December 9, 2020 [4 favorites]

It's a little weird to say that Susan Bennett became a consumer-facing machine voice "unbeknownst" to her.

[I]n June 2005 Bennett signed a contract offering her voice for recordings that would be used in a database to construct speech....

Bennett never knew exactly how her voice would be used. She assumed it would be employed in company phone systems, but beyond that didn't think much about it. She was paid by the hour -- she won't say how much -- and moved on to the next gig.

I did transcription for a dataset for training speech recognition for Microsoft for a bit a year or so ago. I wasn't actually told this directly, but the format they wanted the output in was clearly not designed for human use in any fashion, and there were Cortana and MS bits on some of the filenames. By the end of my tenure on the project, we were editing terrible, terrible machine transcriptions of the audio - since these were linked to time periods in the audio, and that wasn't exposed to us, just the raw text, we had to edit the machine transcriptions as best we could, not do an actual good transcription, which would've been a whole hell of a lot easier.

Oh, and all this in Danish, where the amount is slurring and contraction means there is even less direct relation between spoken and written language than in eg English. And that's before we get on to dialects, which all use the same standard Danish written language.

All the audio was stuff tagged with a relevant Creative Commons license from YouTube. Transcribing interviews posted by podcasts and community radio stations was fine, if tedious, but the Minecraft YouTubers were awful.

Assuming a 6:1 transcription ratio (not meaningfully possible with the source material, especially not when editing awful machine transcripts rather than writing raw - maybe it was better for the people doing other languages?) it would've paid a little bit over UK minimum wage (but I was classed as a contractor, of course, so it was irrelevant). All the audio was sourced without compensation to the speakers. I don't know what the agency charged Microsoft, but I can't imagine they were cheap.
posted by Dysk at 2:37 AM on December 9, 2020 [3 favorites]

She got paid specifically to provide the raw voice data for a computer to remix into a consumer product. The specific end product was not disclosed because Apple and subcontractors treated it as a trade secret. I hope she negotiated good pay for the gig but I'm not seeing much of an ethical issue here.

I don't think that "unbeknownst" automatically implies an ethical issue — Apple clearly did have the legal right to do what they did. Similarly, StyleGAN2 (the model behind This Person Does Not Exist) had the legal right to use the photos that they used in their training set, since they were licensed as Creative Commons and put on Flickr (note that this only means that the photographer gave Nvidia the right to do this, not that the subject of the photo consented).

There's definitely a blurry and contentious line of the ethics of this, though. I think that it would be reasonable to expect Apple to tell the person who's voice they put inside a phone that they sold more than 60 million copies of that they were going to do that before they released it, and to credit her (I don't think that Apple has ever confirmed that Susan Bennett is the voice of Siri, despite it being plainly obvious). Clearly there is no legal need for Apple to do this (assuming they wrote a contract that allowed for it), but to me it seems ethically dubious that they chose not to.

I think Bennett's case is an interesting one to look at for the fact that it's so cut-and-dried, compared to what came later: Bennett is clearly the voice of Siri (unlike other text to speech systems that use an amalgamation of many voices), and thus the labour and contracts involved are very clear, whatever you may think of them. With things like StyleGAN2, the providence of the output of the network is much less clear: people's faces may be captured in public places by photographers who then put the images up on Flickr under a CC license, and then that image will have a minuscule, but very real influence on every face that is generated by the resulting model.
posted by wesleyac at 2:52 AM on December 9, 2020

The tech industry has a very large free labor problem

Since the driving purpose behind all of technology is saving labour, in some cases to such an extreme extent that whole new classes of human activity become practicable that never were before, that's only to be expected.
posted by flabdablet at 3:48 AM on December 9, 2020

Definitely mixed feelings on this. On the one hand I understand that your image is personal and seeing it used without your knowledge can feel bad. On the other hand enabling cool new uses that would be impossible if you had to track down every individual rights-holder is a big part of the point of the Creative Commons license. It turns out, that if you post something online and say "Anyone can use this for any purpose" sometimes people use it for things you didn't anticipate.

Also, I wish I had a little more context for the thispersonexists site. Is it from the same people who created thispersondoesnotexist? Is it a protest against them? Is reposting those images as much (or more of) a violation of privacy as using them in the training data for an AI project?
posted by firechicago at 5:06 AM on December 9, 2020 [2 favorites]

A website showing the real faces of real people used to train This Person Does Not Exist, without their knowledge or consent.

Now being splashed publicly across the internet without their knowledge or consent.
posted by Thorzdad at 5:06 AM on December 9, 2020 [12 favorites]

Susan, come over here. You don't have an accent. Go ahead and read this.

Hard eyeroll …
posted by scruss at 5:19 AM on December 9, 2020 [1 favorite]

I think that it would be reasonable to expect Apple to tell the person who's voice they put inside a phone that they sold more than 60 million copies of that they were going to do that before they released it, and to credit her (I don't think that Apple has ever confirmed that Susan Bennett is the voice of Siri, despite it being plainly obvious).

There's contracts, and then there are ethics. It's highly unethical to me to not even MENTION this was going to happen with her voice. I'll add it to the list of Fuck Apple, I hate them. Assholes.
posted by tiny frying pan at 5:40 AM on December 9, 2020 [4 favorites]

It's highly unethical to me to not even MENTION this was going to happen with her voice. I'll add it to the list of Fuck Apple

While trillion-dollar corporations don't need any help "defending" themselves, I feel compelled to point out that Siri was initially developed by a startup that was purchased by Apple in 2010 -- five years after Susan Bennett made the recordings in question.
posted by Slothrup at 5:54 AM on December 9, 2020 [15 favorites]

Do you know who provided the voice of your preferred non-Apple assistive technology? Were they well compensated?
posted by ardgedee at 5:57 AM on December 9, 2020 [1 favorite]

Uh, ok? Makes no difference to me at all in how they treated this woman. It's a ridiculous and unnecessary betrayal.

I don't use any of that voice activated crap so I'm not worried about it. If you had told me even 20 years ago that a huge number of people would happily and voluntarily put a listening device in their home, I wouldn't have believed you.
posted by tiny frying pan at 6:06 AM on December 9, 2020 [1 favorite]

Uh, ok? Makes no difference to me at all in how they treated this woman. It's a ridiculous and unnecessary betrayal.

Reality doesn't make any difference to you ? And calling it a ridiculous and unnecessary betrayal is a bit hyperbolic, isn't it ? What is this betrayal exactly ?
posted by Pendragon at 6:16 AM on December 9, 2020 [6 favorites]

If the training photos are under Creative Commons 'By' license, then doesn't use of a model trained on them legally require you to attribute every single training photo?
posted by Pyry at 7:44 AM on December 9, 2020 [2 favorites]

My legal headcanon is that many machine learning products may be huge copyright infringements. Like that GPT-3 thing, from what I understand they just went around the Internet and sucked in as much text as their spiders were allowed to grab, without caring too much about the copyright status. Then they fed all that into their model. So like all the Stackoverflow posts and reddit comments and maybe in the future this comment (HELP I'M TRAPPED IN A NEURAL NET) and whatever other bits of text from you me and everybody we know are now infinitesimal parts of it. Do we all have infinitesimal copyright claims? What if the model regurgitates a snippet of something I wrote verbatim, as if often does? That wouldn't fly in a high school term paper.

Same with these pictures. Every 'person that does not exist' is just an interpolation of a whole bunch of people that do exist. The first 'person that exists' I checked had released their photo under a license that requires attribution and forbids commercial use. Kosher? Not sure.
posted by steveminutillo at 7:52 AM on December 9, 2020 [1 favorite]

She got paid specifically to provide the raw voice data for a computer to remix into a consumer product. The specific end product was not disclosed because Apple and subcontractors treated it as a trade secret. I hope she negotiated good pay for the gig but I'm not seeing much of an ethical issue here.

The aspect of not telling voice actors about the content of a job so they can make an informed choice about taking a job is a massive ethical issue for the industry - as you may or may not recall, thit was a major sticking point the last time SAG-ACTRA was negotiating with the video games industry.

Definitely mixed feelings on this. On the one hand I understand that your image is personal and seeing it used without your knowledge can feel bad. On the other hand enabling cool new uses that would be impossible if you had to track down every individual rights-holder is a big part of the point of the Creative Commons license. It turns out, that if you post something online and say "Anyone can use this for any purpose" sometimes people use it for things you didn't anticipate.

This is the attitude that has enabled the abuses of the gig economy. People aren't complaining because the unauthorized use of their image makes them "feel bad", but because "name, image, and likeness" rights are valuable and allow people to say how their own identity gets used (for example, the right to their NIL rights (and the exploitation thereof) has driven a good portion of the actions against the NCAA, such as the O'Bannon lawsuit.)

If the cost of having "cool new uses" is the tech industry running roughshod over people, then that's a price that's too high. Fortunately, I don't think that it's a price we have to pay to have nice things - but the tech industry needs to learn how to actually ask permission and accept when people say no.
posted by NoxAeternum at 8:10 AM on December 9, 2020 [8 favorites]

It's worth saying that it's not clear that they needed the Creative Commons license to do this - just like Google doesn't need a license to make and store copies of the internet for search. Fair use probably enables these types of projects in the United States, regardless. (In other countries, its less clear). If fair use applies, then they don't need to meet the attribution requirements of the CC license, either.

Also, as an aside, when you include a quote in a paper, you need to attribute it to meet academic norms around plagiarism, not legal requirements of copyright law. Quotation is not copyright infringement, it is fair use, and does not generally require attribution as a legal requirement.
posted by mercredi at 8:12 AM on December 9, 2020 [4 favorites]

Interestingly, the third face I saw on This Person Exists was Maryland governor Larry Hogan, which makes me curious how many arguably "public" faces are included in the dataset.
posted by Ben Trismegistus at 9:51 AM on December 9, 2020

On the plus side this post keyed me in to thiscatdoesnotexist.com which was fun to look through and did not likely violate any cat confidentiality issues. What was weird was that the first nonexistent cat to appear looked remarkably like my cat (my cat has a white chin, otherwise it was a damn near pefect match).

Does this mean that we can now generate headshots for advertisements without having to pay an actual human? We can have a spokesperson for a product without any royalties? In the future, no one is paid for their work, right?
posted by caution live frogs at 10:04 AM on December 9, 2020 [1 favorite]

I think Bennett's case is an interesting one to look at for the fact that it's so cut-and-dried, compared to what came later: Bennett is clearly the voice of Siri [...]

Was, I think. Apple's iOS 11 document on their updated deep learning model ends with "For iOS 11, we chose a new female voice talent with the goal of improving the naturalness, personality, and expressivity of Siri’s voice."

I assume the 12 year old recordings (iOS 11 in 2017, SRI's recordings in 2005) ultimately ran out of phones to hybridize.
posted by Kyol at 10:24 AM on December 9, 2020 [1 favorite]

Fair use probably enables these types of projects

My understanding is that 'fair use' is fairly narrowly defined-- it's not clear to me that "I'm going to use your copyrighted image to train a for-profit machine learning model" would qualify under the 'four factors' test (it's for-profit, it uses the entirety of the work rather than a subset, there's no compelling public interest in corporations building privately held ML models, if corporations can hoover up images to train machine learning models without having to acquire licenses it substantially impacts the images' market value).
posted by Pyry at 10:50 AM on December 9, 2020 [3 favorites]

I don't want to turn this into a copyright law derail, but the statement in the parenthetical does not line up well with the way the law thinks about these factors - for example the first factor the purpose of the new work, is about whether it is a new (transformative) purpose, not about whether it's a for-profit use. Fair use enables google internet and image search, for example, which both involve making for-profit, unlicensed copies of millions of whole works.

So fair use probably covers this under current US law. But should it, from a progressive standpoint? I argue yes - more expansive IP protections and a reduction of fair use generally works against the individuals and for big companies. Right now, the market harm factor looks at the license market for the original (in the original context) not a hypothetical licensing market for new uses. If, instead, fair use law said that you have to pay a license fee anytime a rightsholder can create a mechanism to charge you one, you greatly increase the power of companies to control all sorts of new activities, and reduce the power for individual creators and users.

If this were not fair use, it doesn't mean that big tech companies would not do this, or would pay artists licensing fees, they would get data other ways - buy employee badge photos from large employers, pay other large media companies for back catalog, etc. One levelling factor of fair use is that it's also available to the little guy and the independent researcher, not just large tech companies capable of making deals.

Copyright law is already too expansive (covers too many things, is automatic), too long in duration, and penalties are too high. Tech companies should be regulated, privacy laws should exist, there should be regulation of the way AI detection and prediction tools are used to substitute for human judgement in potentially harmful or discriminatory ways, and societies should confront the changing nature of work that eliminates many historical jobs. Minimum wage laws should apply to "consultants" who are fundamentally doing employee-type work. All this direct regulation is necessary and important.

But increasing the power of copyright law is directly counter to that. In fact, fair use and other limits to copyright law are powerful tools for examining, critiquing, and avoiding corporate power. Looking at the recent Linkletter/Protorio case, copyright infringement suits area way that companies avoid being talked about, and fair use is our most powerful defense.
posted by mercredi at 12:19 PM on December 9, 2020 [6 favorites]

If they really believe they have fair-use rights to use any and all images to train their models, why have they restricted themselves to creative-commons licensed images? I am no lover of copyright, but the position that if I want, for example, a stock image of an orange, I can freely download any images of oranges I want, launder them through a GAN, and use the resulting image unencumbered of any licenses of the training samples, seems highly dubious.
posted by Pyry at 12:54 PM on December 9, 2020 [1 favorite]

I don't know their theory, but my guess is that they wanted something that wasn't limited to complying with US law, and where there was already some expectation that the images were going to be reused - for PR purposes, if not legal ones.

WRT your hypothetical - I mean, if you open three different images of an orange and draw an orange, that wouldn't be infringement, right? To the extent that those photographs are providing you with information about what oranges look like out in the real world, the owners of the images don't own the data about the underlying individual oranges.

The parts of this that feel invasive, to me, are about power, transparency, and work/economics, not about copyright.
- it's really cognitively disruptive on a number of levels to have to re-understand and re-evaluate the relationship between photographs and reality. Should there be laws or conventions about displaying images that appear to be photographs but are in fact new images that never represented reality? I don't know, but it's an interesting question.

- should we know when AI is being used for monitoring and decision making and have some transparency about the training input/images and process?

- how in our lives can we continue to seek out and support individual creation, not just corporate content? (how do we avoid licensing everything and owning nothing, how do we limit the ability of companies to write lopsided contracts of adhesion?)
posted by mercredi at 1:19 PM on December 9, 2020 [2 favorites]

The tech industry has a very large free labor problem

This is a problem as old as Mark Twain and whitewashed fences.
posted by pwnguin at 2:20 PM on December 9, 2020 [1 favorite]

"Does a populace get a chance to donate biometric samples to a personal surveillance terminal every day?"

That put the thing in a new light. The world stopped fiddling with its Apple. Google swept its bots daintily back and forth—stepped back to A/B test the effect—added a touch here and there—criticised the effect again—the world watching every move and getting more and more interested, more and more absorbed.
posted by flabdablet at 5:41 PM on December 9, 2020

Slightly tangentially apropos: I've just got off the phone with my hitherto excellent bank, which yesterday sent out an email blast explaining that it was on the point of rolling out VoiceID for securing telephone banking, replacing (yes, replacing! Not augmenting!) the old password-based system.

As somebody who prefers to keep my biometrics off devices I don't control to the greatest extent possible, I find this disturbing. And having previously been unaware that loads of banks have been doing the same thing for some years now, I'm extra disturbed.

How is it possible for executives making bank-executive salaries not to understand that easily available deepfake audio software employs the very same techniques by which VoiceID software models a vocal tract in order to generate a biometric fingerprint?

With Alexa-class devices with always-on microphones rapidly becoming acceptable in more and more households, and the S in IoT standing for Security, and the processing required to generate audio deepfakes requiring less and less energy every year, and banks falling over themselves to jump on this bandwagon, how are we not brewing up a perfect storm of automated bank account fraud right now?
posted by flabdablet at 6:13 PM on December 9, 2020 [3 favorites]

flabadablet my credit union sent me my login password in cleartext (or something equally ridiculous, it's been a couple years, might have been something else).
posted by aniola at 7:51 AM on December 10, 2020

How is it possible for executives making bank-executive salaries not to understand that easily available deepfake audio software employs the very same techniques by which VoiceID software models a vocal tract in order to generate a biometric fingerprint?

Lets set aside the 'how can someone not understand something that will cost them millions if they do' question for a moment -- how does this voiceID work for the deaf?
posted by pwnguin at 12:30 PM on December 10, 2020

On the other hand, apparently you don't need voiceID for Alexa class devices to steal your PIN: https://www.lightbluetouchpaper.org/2020/12/02/pushing-the-limits-acoustic-side-channels/
posted by pwnguin at 12:33 PM on December 10, 2020 [2 favorites]

how does this voiceID work for the deaf?

No worse than any other aspect of telephone banking.

The way it's supposed to work is that they generate a voice "fingerprint" from the first call I make to them after it's rolled out, then compare that on subsequent calls instead of asking me for my phone banking password. This strikes me as a decision made by somebody with way way way too much confidence in this class of technology.
posted by flabdablet at 8:56 PM on December 11, 2020

« Older 64 Reasons to Celebrate Paul McCartney | Coup or counter? Newer »

This thread has been archived and is closed to new comments

MetaFilter

This Person Exists
December 9, 2020 1:37 AM Subscribe

Tags

Share

This Person Exists December 9, 2020 1:37 AM Subscribe

Tags

Share

This Person Exists
December 9, 2020 1:37 AM Subscribe