Defeating AI With A Coat Of Glaze
March 18, 2023 10:28 AM   Subscribe

University of Chicago researchers working with several artists have created Glaze - a cloaking filter layer that is barely perceptible to humans, but negatively interferes with machine learning models to interfere with their capturing an artist's style.

The tool works by injecting minor changes into the artwork, turning it into an "adversarial example" that interferes with machine learning, poisoning the dataset by introducing another style into the dataset. Even artists with images in the wild currently can benefit from putting out new cloaked images.
posted by NoxAeternum (24 comments total) 33 users marked this as a favorite
 
This is cute but doomed. The math doesn't care. It's just another filter.
posted by metametamind at 10:31 AM on March 18, 2023 [10 favorites]


Oh neat. I might start integrating this into my art upload process.
posted by egypturnash at 10:45 AM on March 18, 2023


To be fair, metametamind, they do acknowledge that this is only a temporary tactic and a first-step:

Unfortunately, Glaze is not a permanent solution against AI mimicry. AI evolves quickly, and systems like Glaze face an inherent challenge of being future-proof (Radiya et al). Techniques we use to cloak artworks today might be overcome by a future countermeasure, possibly rendering previously protected art vulnerable. It is important to note that Glaze is not panacea, but a necessary first step towards artist-centric protection tools to resist AI mimicry. We hope that Glaze and followup projects will provide some protection to artists while longer term (legal, regulatory) efforts take hold.
posted by Saxon Kane at 11:23 AM on March 18, 2023 [9 favorites]


Sabotaging the models that would one day detect my melanoma by making sure the robots don't steal my furry porn art

But seriously, the base training models are used for all kinds of things, not just the blatant art theft that anti-AI people focus on. And they require trillions of images to function, so there's not really a viable way to ensure that all of them are fully copyright free... Nor is it clear that doing so is legally necessary, given the history of web scraping as an acceptable practice. But it's interesting to see artists fighting back in what I'm sure will be an escalating series of Spy Vs. Spy antics.
posted by ThisIsAThrowaway at 12:01 PM on March 18, 2023 [2 favorites]


And they require trillions of images to function, so there's not really a viable way to ensure that all of them are fully copyright free
You mean there's not a *profitable* way to ensure that training sets are curated/sourced acceptably. It could be done, it's just cheaper to feed everything in & ask for forgiveness later by promising there'll totally be great uses of it later that you'll benefit from.

Will it be temporary? Certainly. But this's all built off adversarial training, so it requires adversaries. And ingesting countermeasures to countermeasures has also been shown to be a way for further manipulation of the weights to get in. There's a crack in the model, that's how light gets in.
posted by CrystalDave at 12:28 PM on March 18, 2023 [22 favorites]


One thing that's escaped me is why so many people are more impressed/concerned by AI image generation and manipulation, and less on what, via text, it can do with ideas.

AI image manipulation is certainly a very immediate and obvious acheivement (and similarly, AI assisted audio processing is quite impressive - most famous example being Get Back) And deepfakes etc.

To me it seems almost tangential. Anyway, just like police employing forgers, the best tool for detecting AI image fuckaroundery is maybe... another AI? Hmmm.
posted by Artful Codger at 12:29 PM on March 18, 2023 [1 favorite]


"Glaze beta 2 (March 18) Update includes a complete rewrite from scratch of Glaze frontend and updates to the backend. The updated version has no reliance on GPL code"

*shocked face*
posted by you at 1:54 PM on March 18, 2023 [4 favorites]


I read about this previously and I just don't understand. Can't the AIs just throw out the least significant bits? Don't AIs already accommodate for JPEG artefacts? Can't the AI compare its output to what was asked and learn how to defeat the glaze? The paper purports to prove otherwise but I still have serious doubts. I did only skim it. Still, at best this all just feels like overfitting a reverse engineering of one particular AI called Stable Diffusion.
posted by hypnogogue at 3:07 PM on March 18, 2023


I could see this working where you prompt with “make something in a style of Artist X” where Artist X uses this technology. The inputs could be pastoral landscapes but the output is Loab fucking Crunkus.
posted by slogger at 4:12 PM on March 18, 2023 [2 favorites]


As I understand it, hypnogogue, it has to do with the way that computers "see" images. The machine learning algorithm doesn't interpret an image as an image; it takes a bunch of data from groups of pixels it's seen before (the data training set), compares it to another bunch of pixels, and gives an output based on the aggregate likelihood that that particular arrangement of pixels is a face or a flower or whatever. Which leads to things like mistaking deserts for nudes or falsely matching people to mugshots. It doesn't really know what's significant or insignificant, it only knows what it's been told is significant or insignificant in a bunch of data it's "seen" before. I guess Glaze distorts the input, giving the algorithm data that it can't accurately categorize and replicate.

Stable Diffusion et al. will eventually learn the patterns of the Glaze filter, of course; no doubt the engineers working on image generating AIs will add it to their training data, especially if it sees widespread use. But that doesn't mean it's useless. It's a little like locking your front door; no ordinary lock will totally prevent you from being robbed, but you only need to make breaking it just a little bit more difficult to deter the majority of thieves.
posted by radiogreentea at 4:51 PM on March 18, 2023 [2 favorites]


One key detail is that academics play a major role in Glaze.
There's a lot of anti-AI sentiment in higher ed right now. We should expect more such projects, along with other acts of resistance.
posted by doctornemo at 5:04 PM on March 18, 2023 [3 favorites]


It's the start of a copy protection arms race. These never make anyone happy, to be honest, but the people who pay their bills making stuff that they sell to people are usually pretty damn unhappy about people copying it without permission.

Right now this is taking about 20min to process an image on my M2 Air and it's definitely got human-visible results on the stuff I checked on - which is admittedly what the docs say it's worst at, flat-color cartoon work. Hopefully there will be improvements on both of those fronts as quickly as there have been improvements in copyright-washing text and images scraped off the internet without anyone's permission.

Yes, it will only work for so long. Yes, I am sure the AI researchers can find a workaround. Some of them may stop and say "hey wait this is shaky ethical ground" but there will surely be people who just do not give a fuck, or who find the existence of any kind of copy-protection offensive, just as there are people in the cracking scene whose (mostly-unpaid, from all accounts?) hobby is removing the protection on anything they can figure out how to crack.

If someone wants to make an AI model to detect cancer, or whatever, and find that more and more of the new images they're scraping off the web are protected and fucking up their results, well, I guess they have to figure out a way to ask people for images, and give them whatever they want in exchange now, or do without. Or source them themselves. Oh well! They might have to pay for their source material, what a terrible thing! It's not like that's how things have worked for the entire history of mechanical reproduction or anything.
posted by egypturnash at 6:55 PM on March 18, 2023 [5 favorites]


If someone wants to make an AI model to detect cancer, or whatever, and find that more and more of the new images they're scraping off the web are protected and fucking up their results, well, I guess they have to figure out a way to ask people for images, and give them whatever they want in exchange now, or do without.

Who could disagree that cancer detection is a great application for AI? But if it's fed from scraping the web, ur doin it rong. Like with most such research, I expect that the subject matter experts would happily pool their collections of high-quality, verified images, to develop the best possible models.
posted by Artful Codger at 7:51 PM on March 18, 2023 [2 favorites]


"And they require trillions of images to function, so there's not really a viable way to ensure that all of them are fully copyright free"

You mean there's not a *profitable* way to ensure that training sets are curated/sourced acceptably. It could be done, it's just cheaper to feed everything in & ask for forgiveness later by promising there'll totally be great uses of it later that you'll benefit from.


I'm curious which sort of licensing figure would be both viable to do, acceptable to the artists who made the images, while scaling to trillions of images. Like, I'm not sure every artist in the world would be happy with a mere dollar per work, and even that would be roughly in the same order of magnitude as global GDP.

It's not just unprofitable to license images at that scale, it's unviable.
posted by Dysk at 8:06 PM on March 18, 2023 [2 favorites]


If you think that these image-scraping AIs are going to be used for anything more useful than art generation then I've got a bridge in Brooklyn to sell you. AI can do interesting things but they are ultimately synthesizing from their input. There is no capability for thinking outside the box provided by its own training data in AI.
posted by Aleyn at 9:31 PM on March 18, 2023 [1 favorite]


I know this is just the first salvo in a war of technology that's erupting, and I fully and completely side with the folks that want to indiscriminately fuck up machine learning models with bad data. It should be everywhere. On every webpage, in every image. There should be nginx and apache modules that automatically add the latest and greatest Glaze to everything they serve. Make the development of ML software that's even remotely reliable incredibly expensive and labor intensive to produce.

I'm not usually one to advocate for trying to put genies back in bottles but "AI" can get Jafared. Most of the applications I can come up with for it are either just outright evil or will absolutely end up causing deaths when they fuck up in some edge case-y way. As a species, we are too immature to be developing this tech. We are going to misuse it in all the worst ways. We've already begun. I'm going to research how I might best contribute to poisoning data sets.
posted by signsofrain at 12:25 AM on March 19, 2023 [8 favorites]


I'm curious which sort of licensing figure would be both viable to do, acceptable to the artists who made the images, while scaling to trillions of images. Like, I'm not sure every artist in the world would be happy with a mere dollar per work, and even that would be roughly in the same order of magnitude as global GDP.

None of the models were trained on trillions of images, I think most were trained on the range of hundreds of millions to single digit billions, and the useful image set was probably much smaller (e.g., non near-duplicates, well labeled, etc.)

Shutterstock has around 400 million images. Adobe Stock has around 200 million. Getty I think is in the same ballpark. And their images are probably better labeled. So if one or all of them decided to make something like this, they probably could. They might need to acquire some additional images to improve some areas, but I'd be very surprised if none of these companies were working on this now.
posted by justkevin at 6:39 AM on March 19, 2023 [1 favorite]


Machine learning for cancer detection typically uses clinical data (academic hospitals already have tons of these images - not millions but thousands - plus there's the National Cancer Database). Almost all patients will sign yes to the waiver that asks if we can use their images for research. There's no reason to go to the internet to use unverified images.

One problem with this apparently is training the AI not to think that the presence of a ruler in an image is a diagnostic indication for cancer. (I don't have a source for this, it's just from a conversation I overheard with the doctors I work for, but of course most of our training images of cancers have rulers in them.)
posted by joannemerriam at 8:09 AM on March 19, 2023 [6 favorites]


On the note of patient confidentiality and such, the LAION-5B database from which all the art-generators use to make artwork contains restricted medical photos without any knowledge or consent from the patient, and other great things like video depictions of rape and snuff, and has been used to generate revenge porn. yay! Looking forward to the deepfake propaganda too.

yeah fuck everything about it. And artists taking measures to protect their work and keep a fair wage from clients isn't going to have any affect on cancer detection since AI development for medicine doesn't use Artstation/Pixiv/DeviantArt images of people's half-demon half-angel fursonas, so nothing lost if those artworks are ditigally cloaked as blotchy messes.
posted by picklenickle at 9:33 AM on March 19, 2023 [3 favorites]


the models that would one day detect my melanoma

I hereby christen this "Roko's Oncologist".
posted by Not A Thing at 3:07 PM on March 19, 2023 [7 favorites]


Art is dead
posted by L.P. Hatecraft at 7:10 PM on March 19, 2023


Art is dead

scrawled with chalk on one of the many monstrous metal sculptures that appeared around campus

quickly edited to read This Fart is Deadly

amen
posted by elkevelvet at 2:55 PM on March 20, 2023 [1 favorite]


I think they'd be better off just putting some sort of flag into the JPEG/PNG metadata (since those are the two dominant image formats used on the web) that says "please do not include in ML models" and call it a day.

Everything else is a waste of time, a sort of zero-sum game that doesn't really get anyone anywhere.

And before someone calls this out as hopelessly naive, bear in mind that traditional web scrapers work exactly this way. You create a robots.txt file in your webroot if you don't want your site scraped and indexed by Google, etc., and they respect it as a matter of convention. There's no real technical means that stops Google from scraping your page if they want to (and there are 100% people out there who are going to scrape content regardless). But it works well enough that it's not worth it for most people to burn a bunch of time and energy doing anything else.

Beyond that, it's not at all clear to me that using copyrighted works to train an ML model is even close to copyright infringement, particularly when the end product—a synthetic work "in the style of" some artist—is probably not copyright infringement anyway. You can't copyright a style. There's a pretty hard line as to what constitutes forgery, and making something that looks vaguely like a Monet but that isn't a Monet, and isn't being passed off later on as a Monet (which would be fraud rather than forgery; forgery isn't retroactive), doesn't get particularly close.

And as I've said in other threads, I think this whole AI-generated "art" business is a lot of smoke and very little fire. It might negatively affect some commercial artists in the same way that portrait painters were negatively affected by photography—I'm not sure what the exact parallel is, but maybe people churning out stock art or something—but the majority of the "art" being made by AI models is pretty unimpressive except for having been produced by an AI model. As artworks they tend to be pretty uninspiring. They're demonstrations of what the technology can do, but as a means of conveying human emotion, of communicating something from the artist to the viewer, they don't really say much. (I have said elsewhere and think that in time this will change; there will eventually be people who are going to use AI models the way Stieglitz used a camera, and will probably blow our collective socks off by showing us what can really be done with the tools in the hands of someone with real skill and something to say. But right now? Meh. It's like the early photos of horses running and stuff. Damned impressive at the time, but today any kid with an iPhone can do the same thing and it wouldn't be worth the ink to print.)

The only artists who should feel remotely threatened by AI are the ones who aren't any better than AI. Which is not really most serious artists, at least that I know. Maybe the dude doing 2-minute caricatures for $10 at the park is in trouble, but he was in trouble a long time ago (including from a lot of people on Fiverr who would do the same thing, just cheaper); and really, he's probably no worse off, because the real service he's offering is the weird opportunity to pay someone to draw your caricature (or your kid's) with a fat Sharpie on some blotter paper in the park. Same thing with the people doing street art with cans of spray paint or airbrushes or whatever; they're offering a service, not a product, and in doing so they've beaten the mechanical-reproduction game by not playing it.

But... there are people who do benefit from the AI arms race, of ML models used to find weaknesses in other models and ways to fuck them up with minimal inputs: governments and militaries. That's the real market. Build an adversarial model that can jam up Stable Diffusion and you've probably got a pretty good idea of how to jam up a model that picks camouflaged soldiers out of elephant grass. And that is where the big money is, and probably why they just hastily expunged all that icky GPLed code.
posted by Kadin2048 at 9:09 AM on March 21, 2023 [2 favorites]


Art is dead.

Just resting actually. Glazing all the Codger family pictures is hard, sticky work.
posted by Artful Codger at 4:18 PM on March 21, 2023


« Older Saturday Morning Youtubes   |   You Reap What You Sow - Meet Ira Wallace... Newer »


This thread has been archived and is closed to new comments