Hats are cakes, and handbags are toasters
October 23, 2023 1:58 PM   Subscribe

"A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways. The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission." (Melissa Heikkilä, MIT Technology Review)

"Poisoned data samples can manipulate models into learning, for example, that images of hats are cakes, and images of handbags are toasters. The poisoned data is very difficult to remove, as it requires tech companies to painstakingly find and delete each corrupted sample."
posted by MonkeyToes (16 comments total) 31 users marked this as a favorite
 
Zhao admits there is a risk that people might abuse the data poisoning technique for malicious uses.

Reading this, the question that springs to mind is "What could possibly be malicious'?" or, maybe, rather - from what point of view are we rating malicious. (And this is addressed later in the article) the AI companies are scraping data without offering any compensation whatsoever to the humans who generated the art - an arguably malicious act - so how can defending against that be 'malicious?'

Just an odd moment, do you think tech 'progress' of any sort is positive? Or necessary?

As an intuitive Luddite, I'm for poisoning the AI's
posted by From Bklyn at 2:54 PM on October 23, 2023 [8 favorites]


What I don't quite get about these types of initiatives (including MeFi opting out of being in training data, for that matter) is that these AI entities are:

1. Representative of a broad consensus/synthesis of Internet-published human thought.

2. Only going to become more and more central to how we do all kinds of things, going forward -- no matter how one might feel about this, it's hard to deny the inevitability.

Given those two things, removing one's own contributions from the training data is sort of like doing your best to make sure your own views and aesthetics and preferences are not represented in the ongoing and future ordering of society. That's what made my heart hurt a little about MeFi's withdrawal -- I want us to be represented in the groupmind, I think we make good thoughts here.

On a more pragmatic level, I also have a hard time seeing the harm it does to me personally to have my own personal contributions making up a rather infinitesimal fraction of an AI's training data.

Also, pragmatically, the amount of people poisoning their data is going to be pretty tiny, and as such I would guess that it's not really going to present much of a challenge to an AI to discover that they're statistical outliers and ignore them when producing output.

I dunno. If you're afraid of AI's being bad, maybe try helping them be good?
posted by slappy_pinchbottom at 3:14 PM on October 23, 2023 [8 favorites]


I’m not being paid to do that. Quite the opposite really.
posted by seanmpuckett at 3:25 PM on October 23, 2023 [20 favorites]


slappy_pinchbottom... The problem that many artists have is that the AIs are training on their work without the rights to do so, and then the resulting AI models are being used to put the artists out of work. I work in print advertising, and I use the AI tools in the new Adobe releases a lot. A difference with those is that Adobe has trained their models specifically on the stock photo library that Adobe already owns, and I believe individual contributors can opt out if they so choose. I know that graphic design jobs are going to take a massive hit from AIs (despite the joke that "for an AI to give a client what they want, the client needs to give it clear instructions, so I think we're safe.") We're already in an age where tons of design jobs are already gone - why pay for design when you can just log in to Canva and use their templates and put something together quickly that way? AI is going to accelerate that. I figure if they haven't already, Canva and others will incorporate an AI model that will output whatever the user wants. A lot of this will replace work that someone used to need to spend hours to get right - and made a living doing so. Musicians are going to take a huge hit as well, since AI can output a lot of background type music, small budget movie music, things like that. The rate at which AIs are improving is striking. When you see the improvements that ChatGPT has made over the last two years, or image generators like Midjourney, the quality of the output is far better than before. Perfect, ready for primetime? Probably not yet. But we aren't far from it.

So yeah, artists have perfectly valid reason to use their work to poison the AIs. If AI makers train their models on work they're not supposed to use. that's gonna be on them.
posted by azpenguin at 4:01 PM on October 23, 2023 [16 favorites]


Your second point is doing a lot of heavy lifting? There's no guarantee that it is true. In part fighting against it helps prevent it from becoming inevitable.

And sure, if you don't mind, you get to choose to add your art or words to these AI models. Great! But choice is really important here.

Plus, so far AI has been pretty laughably not great? It's maybe not quite crypto level of uselessness, but there hasn't exactly been anything to really be that killer app. It seems a loooot like a labor killing device as opposed to something moving human art forward. Why would I want to make that be good?
posted by Carillon at 4:01 PM on October 23, 2023 [8 favorites]


This won't work. Broadly, what this tool is trying to do is skew the image statistics in such a way as to "fool" a trained network, similar to adversarial attacks (a visual example). However, if the image statistics are changed by Nightshade, they can be changed back either by transcoding the image through other formats, down-and-up sampling, or maybe just by analyzing and inverting the "poison" noise process.

Data cleaning is the first step in training and machine learning system; all this does is add another step to the process.
posted by riotnrrd at 4:04 PM on October 23, 2023 [4 favorites]


doing your best to make sure your own views and aesthetics and preferences are not represented in the ongoing and future ordering of society.
And naturally the next step would be to have the hypothetical future AI groupmind encourage contribution to the groupmind. Perhaps by unfavorably freezing its conceptual mapping of those who defected. Like some kind of... basilisk, if you will.

More seriously, many small-e evangelical totalizing worldviews want to pull this trick. "Hey, my vision of the future is inevitable! Even those who reject it will be subsumed! So wouldn't the better option be to help it get better in your eyes? By getting in sooner, you'll have more influence than those who come after!" Cryptocurrency, Urbit, MLM schemes, GameStop meme traders, etc.

And it's difficult to engage with; because so much relies on that initial claim that this bold AI future is inevitable (and a good thing), such that there's no use fighting it. You want to smooth over that to get to what you see as the meat of your point; I think that's asking for a whole lot of ground to be ceded unfavorably.

There's also a conflict between "not represented in the ongoing and future ordering of society" and "Representative of a broad consensus/synthesis of Internet-published human thought." + "central to how we do all kinds of things". It being made central to how some people do all kinds of things is dependent on the perception that it's representative of a broad consensus. If we opt out, that tarnishes that perception a bit. If enough people/sites/datasets/resources opt out, then its usefulness drops. Not the effectiveness of the LLM-as-AI itself necessarily, but it's all only as useful as it's perceived to be. So the perception is what's useful.

So given that, the value of pushing this message of inevitable surrender/cooption becomes clear. The perception of its all-encompassing & inevitable nature must be preserved.
posted by CrystalDave at 4:40 PM on October 23, 2023 [8 favorites]


I really wish they had explained how this work. So I paint a picture of a dog and then I change the pixels and then Dall-E starts painting cats when you ask for a dog? Like can I get a close-up of these poisoned pixels? How does that work?
posted by If only I had a penguin... at 5:03 PM on October 23, 2023 [2 favorites]


If AI is being trained on everything that has been made, can it ever come up with something original?

Is it limited to the best remix ever?
posted by B3taCatScan at 5:05 PM on October 23, 2023


Is it limited to the best remix ever?

I am kind of neutral on AI, so don't take this as a defence, but isn't everything, including that which is human created, a remix?
posted by If only I had a penguin... at 5:07 PM on October 23, 2023 [2 favorites]


Honestly, I think the generative AIs are going to do a pretty good job of poisoning themselves. They had to digest the recorded human output of millennia to be as shit as they are now, and I haven't heard any compelling explanation of how they won't just ouroboros themselves into nonsense within a few years. It feels like fossil fuels all over again, except I haven't yet seen any evidence of these models doing significant meaningful work.
posted by McBearclaw at 5:46 PM on October 23, 2023 [1 favorite]


There's also the environmental cost to consider.
posted by aniola at 10:05 PM on October 23, 2023 [3 favorites]


> I am kind of neutral on AI, so don't take this as a defence, but isn't everything, including that which is human created, a remix?

remix to ignition is not a remix
posted by bombastic lowercase pronouncements at 11:58 PM on October 23, 2023 [2 favorites]


If such a thing is possible, it will be used more for spam and malicious hacking than anything else.
posted by SemiSalt at 5:02 AM on October 24, 2023 [1 favorite]


The idea that this is "inevitable" is just... lol. A contraption which requires scraping millions of other people's work without their consent, and also requires paying millions of people pennies to tag and sort everything? That does not seem very good or efficient. The whole process wreaks of imperialist shit on top of it all.

Reading this, the question that springs to mind is "What could possibly be malicious'?"

Presumably someone could use this on an AI system other than MJ/SD/etc. For example, what if I made a face recognition software think I'm the rightful owner of an account I'm hacking into?

Except personally I don't really care about this because every image-based AI thing I've seen has either been creepy, pointless, or malicious in itself.
posted by picklenickle at 6:40 AM on October 24, 2023


Generative AI models are excellent at making connections between words, which helps the poison spread. Nightshade infects not only the word “dog” but all similar concepts, such as “puppy,” “husky,” and “wolf.” The poison attack also works on tangentially related images. For example, if the model scraped a poisoned image for the prompt “fantasy art,” the prompts “dragon” and “a castle in The Lord of the Rings” would similarly be manipulated into something else."

This is how everything becomes porn.
posted by vitia at 1:18 AM on October 25, 2023 [1 favorite]


« Older Sometimes it's easier to click a checkbox in Excel...   |   Seek out what magnifies your spirit. Newer »


This thread has been archived and is closed to new comments