CLIP Art
August 5, 2021 12:54 PM   Subscribe

Alien Dreams: An Emerging Art Scene. In recent months there has been a bit of an explosion in the AI generated art scene. Ever since OpenAI released the weights and code for their CLIP model, various hackers, artists, researchers, and deep learning enthusiasts have figured out how to utilize CLIP as a an effective “natural language steering wheel” for various generative models, allowing artists to create all sorts of interesting visual art merely by inputting some text – a caption, a poem, a lyric, a word – to one of these models.
posted by Cash4Lead (47 comments total) 35 users marked this as a favorite
 
I would rather the screaming came from inside the human head, via the arm muscles, rather than out of a machine. It is bad enough how the art world runs, money laundering, class divisions, hyper monetizing, then the ubiquitous telephone camera with HDR comes along and everyone is a photographer, now everyone who can type with two index fingers and can afford a program, of some sort, afford an internet connection, even not, is an artist. I have a friend who is a frustrated artist, who went to school to be a frustrated artist, she calls monetized, small form art, N'ART. To me a lot of it looks like melting nightmare dream scenarios, inside of decaying infrastructure, or a jaundiced eye examines nature, with no knowledge of it. I see a lot of this on the web, and go right by it, garbage in, garbage out.
posted by Oyéah at 1:05 PM on August 5 [5 favorites]


I have a friend who's using this to generate art they post to Facebook. It's both... gorgeous and super creepy.
posted by hanov3r at 1:21 PM on August 5


I have only skimmed this so far, but the examples in the link are, to my eye, pretty cool and pretty delightful, and I am really happy to see the extensive explanations of the various systems (which I hope to understand at least a LITTLE more once I have a chance to read more closely).

"matte painting of a house on a hilltop at midnight with small fireflies flying around in the style of studio ghibli | artstation | unreal engine", in particular, is pretty great.

I'd actually really like to try this out myself, and I'm so happy to see links to Colab Notebooks where I might be able to do that, but I'm going to have to spend a little time figuring out how that works.

But I'm really happy to know about this, and looking forward to learning more.

Thanks so much for sharing this with us, Cash4Lead!
posted by kristi at 1:22 PM on August 5 [2 favorites]


we really are going to absolutely eradicate everything aren't we

Someone should hook up an infinite loop of these things. A thing that can look at a ML-generated image like in this post and generate a description, then send that description back into an ML thing that generates images based on textual input, and just let that thing loop until... the Demiurge manifests or whatever.
posted by glonous keming at 1:39 PM on August 5 [7 favorites]


now everyone who can type with two index fingers and can afford a program, of some sort, afford an internet connection, even not, is an artist.

There's always going to be a tension between the ready availability of tools and the excitement that availability breeds, and what those tools can do in the hand of someone skillful, thoughtful and experienced. We don't really complain about how anybody with a couple of bucks can buy a set of watercolors and cheap brushes, for instance. We want people to be able to do that! Art should be for everyone.

This kind of art is just like that. Because you can do some fun things with relatively little effort--I think there was a post about https://thisartworkdoesnotexist.com/ a little while back--where you're happily clicking, and having the joy of novel images appearing to you. Which is great! But then you also have what this article is describing, and it's so much work...every few months I tell myself, now's the moment I'll learn what it's all about, I get like two paragraphs into an explanation and there's all this math and I have to go scuttling back under my rock for a while.

I love it. I love the way you see a sort of personality with these different engines--the article compares how some are sculptural, others more painterly--and often it is a little nightmarish, because it's got that uncanny valley look to it, it's trying to be realistic about a world it will never be able to see or experience. And so there are these...I don't know, it's like when there's a painter you know so well, and you look at individual brush-strokes and say, aha, I know what you were up to there. And similarly you'll look at one of these pictures, see the jagged bright artifacts around a dark line, and think, aha, I know what you were up to, and I know you were trained on a thousand years of compressed jpegs, and that's how you see the world. A painter looks at a scene and wonders, how do I show someone what it's like to look at this scene, where do I include details to mimic the experience of using human eyes with their focal points and blurry spots, even though I know someone will be looking all over the canvas? And I wonder--not what these engines think, because I know they don't really think, but given what they're creating, what would they be thinking about, if they were thinking? What would it understand by the term "unreal engine"? A particular world of shiny, crisp surfaces and light leaks and blurs? Like if we could go back and ask one of those Byzantine painters, do you really think heaven looks like that, or is it just because you had a ready supply of gold leaf?
posted by mittens at 1:39 PM on August 5 [8 favorites]


I would rather the screaming came from inside the human head, via the arm muscles, rather than out of a machine.

What's your opinion on photography?
posted by telophase at 1:43 PM on August 5 [5 favorites]


Don't worry about VQGAN+CLIP doing the jobs of artists, illustrators and designers. DALL-E will be doing the jobs of artists, illustrators and designers.
posted by The Half Language Plant at 1:51 PM on August 5 [1 favorite]


You can play with a lot of DALL-E images at https://openai.com/blog/dall-e/. If you open up any of the black example boxes, you can change the text and have it generate appropriate images.

There's a lot of art-generating fun to be had at Artbreeder, also.
posted by pipeski at 2:36 PM on August 5 [2 favorites]


I've seen this blow up with artists and programmers and artist-programmers, and I think its partially because VQGAN+CLIP really invites users to push it to its limits, to take it apart and experiment and be an active participant in creation, in a way that a closed black-box service like DALL-E doesn't.
posted by Pyry at 3:04 PM on August 5 [2 favorites]


So you can play with some of this stuff yourself, using free cloud computing. Here's a Colab notebook with easy to follow instructions. Press play to load the code. Type a phrase, wait 10 minutes for some mysterious cloud computer to do its thing, and voila, an image. You can use the notebook like a black box (as in DALL-E) or you can look at the code and understand it and start modifying it yourself.

This notebook is courtesy of Tom White who has been posting some samples. Hannah J is also doing some neat things with CLIP+VQGAN. I goofed around with the notebook yesterday and have a bunch of samples. The best one, IMHO, is Known Unknowns.

There's a lot of people experimenting with these techniques right now, I can't wait to see some really interesting art come out of it.
posted by Nelson at 3:05 PM on August 5 [13 favorites]


*takes a huge snort of cocaine*

I'm going to set up an automated production line which uses random input scraped from Wikipedia, fed into CLIP, and automatically pushed to the nearest NFT warehouse. All I need to do is sit back and let the crypto roll in, until the heat-death of the universe, which, thanks to me, should happen some time next week.
posted by JohnFromGR at 4:43 PM on August 5 [7 favorites]


I'm looking forward to feeding the text of procedurally generated games like Dwarf Fortress or Caves of Qud into one of these to get uncanny valley illustrations as a window into that particular world.
posted by Anonymous Function at 5:06 PM on August 5 [1 favorite]


This stuff is crazy, it's like someone found a alien spaceship in the desert and hooked the warp engine to its own tailpipe to answer questions about the universe. I haven't yet stumped it when asking it to ape an artist's style.

The AI Jukebox researchers also suffixed their conclusions with an existential crisis, that is, the more you generate and consume this stuff the more you find yourself asking -- to what end? I'm guessing similar questions were asked moving from religious to secular art. The art is fine, sometimes great, but the knowledge that it wasn't made by human hands overshadows the output. That's our problem -- the AI has no questions about what it's trying to do, it's trying to make a visual perceptual model sync up with a verbal one.

But you haven't seen anything yet. Wait until this is applied to video and 3D geometry.
posted by RobotVoodooPower at 5:13 PM on August 5 [3 favorites]


Oh now run it through a distortion pedal. Oh yeah.
posted by marcpski at 6:25 PM on August 5 [2 favorites]


now everyone who can type with two index fingers and can afford a program, of some sort, afford an internet connection, even not, is an artist.

You say it as if it's a bad thing. This is a new space for artistic exploration, and yes, anyone with two index fingers can dive into it.

One of my forays.
And another.
And another.
Evidence that this stuff is approaching self-awareness.
And more.
And more

And a portfolio of a pen and ink sketch I made 20 years ago passed through 250 different style transfers.
posted by ocschwar at 6:44 PM on August 5 [4 favorites]


I think there is more taste and skill involved than naysayers might think. Anyone can make a collage or put paint splatters on paper. But doing so in a pleasing way requires many choices that lead to different outcomes. Then once the piece is finished whether or not to display it, and how to, are further choices that contribute to the viewer experience.
posted by tofu_crouton at 7:55 PM on August 5 [1 favorite]


I tried the Colab notebook linked above by Nelson but kept getting runtime errors like it hadn't properly installed the software on the VM or something. I'm good for tonight but I might have another whack at it tomorrow.

If anyone wants have a go while I'm asleep, I was trying to do "demiurge obliterates reality to begin anew in the style of max ernst"
posted by glonous keming at 8:11 PM on August 5


I've been playing with the Colab app, and have been surprised how well the "in the style of ..." feature functions beyond the obvious "famous artists" whose names I remember from Art History class. One of the linked articles mentioned using "in the style of studio ghibli", which works well. Some others styles that I have found to work impressively are "Pixar", "Disney", "Animaniacs", "Tex Avery", and my favorites so far: "Charles Addams", and "Gahan Wilson".

glonous keming -- did you click the play button next to the "Setup" section on the demo page before entering the settings for your run?

A hint about saving your generated images -- some of the menu options on the app that look like they would be good for this are actually for some other thing (which I'd probably have to know more the app itself to understand). The simplest thing is to right-click on the image and save it from the pop-up menu.

App hint #2: Scroll down below the area where you enter your image text to see the generated images. It took me an embarrassingly long time to figure this out.

App hint #3: It will continue to generate more images from your text after the first one appears. Later images will have been generated from more iterations of the algorithm, so they tend to get more detailed (although some of my favorites have come from the first or second iteration). It takes a little time, so be patient.
posted by TwoToneRow at 9:27 PM on August 5


Interesting for sure. Weirdly despite the remarkable variety of styles I perceive the outputs as sort of same-y, not unlike colorcycling fractals or having a conversation with Eliza. Pleasant obsequious gibberish and soothing mimicry. It's weird because I studied AI in Uni and I would have absolutely loved this kind of stuff back in the day. Now it seems aimless and perhaps a bit gauche, like a lava-lamp.
posted by dmh at 3:54 AM on August 6 [3 favorites]


I got it working today. I'm peeking through the code to the best of my ability but I'm not sure where I should be looking. Does anyone know how you can make the output a larger dimension, like 1080 or even 4K?I'm sure it will take a long-ass time but it might be worth it.
posted by glonous keming at 8:48 AM on August 6




I've been playing around this this last night and today, and how "good" it turns out depends on the input. I got a lot of boring or downright creepy stuff, and adding "artstation" or "unreal engine" usually improves it in a way I like, but here's some of my favorites:

"grimdark fairy unreal engine"

"eldritch sword unreal engine"

"book cover of ninefox gambit by yoon ha lee | artstation | unreal engine" first image subsequent image

"Round the decay Of that colossal Wreck, boundless and bare The lone and level sands stretch far away. Trending on Artstation"

And swiping a comment from my husband on another image that didn't turn out quite so well:

HR Geiger drawing someone in Dragon Age video game who drinks too much lyrium | artstation | unreal engine

Caravaggio painting someone in Dragon Age video game who drinks too much lyrium | artstation | unreal engine

Those final two have similar elements because I'm using the same seed number to generate them and relying on the same default base image. Changing the Caravaggio prompt's seed number to "420" because I have the heart and soul of a 12-year-old boy gives me this result, which I like a bit better because it has a more dynamic tilt to it.

Results also depend on the seed image. If you don't use one, it resorts to a default which for the Colab notebooks I was using looks like computer-generated alligator skin.

I gave it the Caravaggio prompt again, leaving the seed number at 420 because I forgot to change it, and gave it this render/artwork I did last week as the seed image to start from, which resulted in WAY more Caravaggio-like chiaruscuro than the previous results. I think the result is pulling a lot from Caravaggio's Bacchus.

So people fretting that AI-generated art means that it's taking the artist out of the equation: no. And I'm not talking about prompting it with artist names, I'm referring to it taking a lot of tries and experimentation and changing of seed numbers and images before it produces something that, in my artist's eye, I think good enough to present as art. Of all the ones I posted above, the Yoon Ha Lee prompt is the one that produced the piece I liked best (I slightly prefer the first, flatter, image, but the other one is good also). There's several that I'm itching to take and use as an underpainting of sorts, to work into a painting, but which aren't there yet, like the Ozymandias and Caravaggio 420 prompts.

It's not replacing the artist at all: it's just added another tool to the toolbox.
posted by telophase at 9:15 AM on August 6 [3 favorites]


So nice to see people sharing their images! It's a remarkable piece of software.

I agree there's artistry in using this tool. There'd be a lot more if the tool provided more knobs to turn or if you were personally involved in the design and training of the network. But what I think is most interesting is VQGAN seems to have an identifiable style of its own. What dmh calls "same-y" I call "the robot artist's style". It's changeable to some extent by prompting with "unreal engine" or whatever. But VQGAN has its own way of seeing and representing the world and it's quite visible in what it generates. That's really provocative.

Does anyone know how you can make the output a larger dimension

Not a lot of options with the Colab notebook I shared. If you set the output quality to "better" you get a slightly bigger picture and it takes twice as long. "Best" won't work on the free compute resources.

Wacky thing about the GAN approach; it's all generated on a per-pixel basis. Generating twice as many pixels is twice as much work. (Worse, twice as many pixels requires twice as much video RAM, a scarce resource.) What you and I perceive are large scale features: faces, objects, etc. But the algorithm is generating pixel by pixel. The larger features are latent in the neural network.
posted by Nelson at 9:40 AM on August 6 [3 favorites]


I tried out several of the notebooks linked here and in the original article, all with various prompts related to "a hedgehog playing a toy piano". My favorite results are here (instagram, sorry - but you don't need to log in to see them)
posted by moonmilk at 11:37 AM on August 6


Has anyone been able to get zooming, panning and rotating effects? I've been trying this notebook , but it crashes every time I run it.
posted by The Half Language Plant at 1:09 PM on August 6


There'd be a lot more if the tool provided more knobs to turn or if you were personally involved in the design and training of the network.

This depends on which notebook you use; some have been stripped of all their knobs to be as easy to use as possible, while others expose a fair number of settings.

But what makes this so inviting for hacking is that it isn't a big network trained end-to-end, rather, it's frankensteining together two pre-trained networks, and once you've seen how the stitching is done, you can try your hand at grafting on other functionality, like getting it to produce tiling backgrounds.
posted by Pyry at 3:31 PM on August 6 [1 favorite]


moonmilk: instagram, sorry - but you don't need to log in to see them

Sadly that does not seem to be the case :(
posted by fader at 4:45 PM on August 6


Humor me here. I'm finding this all disturbingly enrapturing and I am curious what I need to do to run this kind of stuff on my own hardware, but I'm so far out of the loop I don't really understand what's going on or what I need. I have tons of hardware, I can spin up a Linux VM easily. Is this software I can then install and run for free? I have to subscribe to the models or something? I am not a software developer and I know basically nothing outside a lay-familiarity with ML. Where do I start?
posted by glonous keming at 7:51 PM on August 6


I should add, I'm not completely lost, I have a smidgen of knowledge about python but I'm in the early learner stages, nothing approaching an actual coder. I guess I'm just missing a lot of pieces here on how I can tell my own local hardware to generate me a 4K nightmare hellscape of anime characters having a picnic on the inside of a demon's lung in the style of a random doujinshi artist and letting it run for a week to see what happens.
posted by glonous keming at 8:04 PM on August 6


Glonous, you don't need to mess with most of that. All you need is a computer with a browser, and a Google account. You go to a web page that has a "Google Colab notebook," which is a thing that runs code.
This is a good one. There's an introduction here. Google's computer is doing all the heavy lifting.
(I realize now that this is an answer to a somewhat different question.)
posted by The Half Language Plant at 8:18 PM on August 6


I tried that just now but ran out of memory which I why I was wanting to run it locally but thank you. I'll investigate more.
posted by glonous keming at 9:14 PM on August 6


You can run all this yourself but it's fairly complicated. The Colab notebook I linked has source code, including the code to install all the packages needed to run it. In theory you should be able to do that on your own computer and it'll work more or less the same. In practice getting some of that stuff to install is tricky, particularly the CUDA part that lets you use your GPU for computing.

I was going to run this on my own machine but then I realized even my fairly powerful 1080 Ti graphics card "only' has 11GB of VRAM; I believe the cloud machine you get for free has 16GB. And that's the limiting resource when you "run out of memory" trying to run at Best. A friend of mine had more luck by paying Google $10 for compute resources (for a month? Not sure.) I still think you run into VRAM limits pretty fast.
posted by Nelson at 6:13 AM on August 7


Glonous, this page is about getting VQGAN+CLIP running locally.
posted by The Half Language Plant at 7:13 AM on August 7 [1 favorite]


oh this is a GPU+VRAM thing not a CPU+RAM thing. i guess i'll forget about this then. thank you all.
posted by glonous keming at 9:26 AM on August 7 [1 favorite]




I found a work-alike that doesn't require CUDA, but I tried running it on the phrase "ultimate duck" and it was only 2/3rds finished after an entire day. I had to abort it at that point because a driver installation meant having to reboot.
posted by JHarris at 3:03 PM on August 7


In terms of prompts, I've found an interesting technique: Many museum websites add alt tags to the images in their collection describing it for the visually impaired. The Cooper Hewitt site is one of the best at doing this (just hover over the image and wait for the text hint to appear). The basic idea is to see how the description of an existing image is converted by the AI visually.

So take the alt tag description: "Fragmented sculpture of idealized female nudes missing their heads and facing in different directions" and adding "in the style of Studio Ghibli" at the end results in this.
posted by jeremias at 5:36 PM on August 7 [1 favorite]


i'm in no position to buy this on a whim but fwiw apparently you can get refurb 24GB DDR5 Tesla M40 accelerators for $300 - 400 off eBay or newegg etc. they have no output for a display, strictly compute cards. no idea how easy or hard that might be to get started with but if my finances today were sent forward in time from two years ago i might just say fuck it.
posted by glonous keming at 8:27 PM on August 7


I put a few lines of Coleridge into CLIPIT.
posted by jabah at 7:41 AM on August 8 [5 favorites]


The Illustrated VQGAN, a nice writeup of how the software works.

Meanwhile, on aesthetic reflection, I'm already bored with VQGAN. I mean it's amazing what it's doing but it's not like it's an endless well-spring of interesting and novel images. Even though it literally is that. I think I got tired of the robot artist's style.
posted by Nelson at 6:57 AM on August 11 [2 favorites]


I think I got tired of the robot artist's style.

What's strange to me is running into walls where clearly the AI has one idea about a phrase and is sticking to it. Like, I did a few phrases that included Rembrandt, and kept finding its idea was that Rembrandt always has a glowing book dead center of the picture, which makes a sort of sense but is a really limited view of what's going on in Rembrandt's lighting?
posted by mittens at 8:00 AM on August 11


as of yet i have not once got VQGAN-CLIP to work. CLIPIT I can get results from but VQGAN-CLIP always throws me an error either about missing libraries, undefined functions, or if i get past that i just am out of memory with the default settings.
posted by glonous keming at 10:12 AM on August 11


i did finally get VQGAN+CLIP working but i can't say why
posted by glonous keming at 7:16 PM on August 11


There's a new robot artist and it went to pixel art school. New notebook of a system that generates stuff that looks like pixel art. Here's a bunch of examples.
posted by Nelson at 7:15 AM on August 17 [1 favorite]


I tried that one, but after setting it up and trying to run it it died with an Out Of Memory error.
posted by JHarris at 10:10 PM on August 17


AI movie posters: Each of these images was generated by AI based on a brief text description of a movie. Can you guess the movie from the image?
posted by Nelson at 8:33 AM on August 31 [1 favorite]


It's hard to pick which is the best, but--no, I'll say it, the Wizard of Oz one was the best.
posted by mittens at 1:15 PM on August 31


« Older California Dreaming, Nightmare Edition   |   PieceWork Magazine Newer »


This thread has been archived and is closed to new comments