Some textual tips on text-to-image generation
September 4, 2022 12:21 PM   Subscribe

 
These are so great. I want one of these generators to let me create drop down selection windows at will, to keep track of and apply all the cool, useful descriptors.
posted by rebent at 12:47 PM on September 4, 2022


I played with MidJourney until I ran out of free renders. I'm not up to paying $12/month for that kind of service yet, but it's a fascinating thing to play with.
posted by hippybear at 12:48 PM on September 4, 2022


Thanks for this - I'm playing with a colab following the Stable Diffusion write up and seeing very clearly how we got here.

I'm working out how to use other images as a source for next generation - sure I could go look at other projects but this is forcing me to think through the mechanics of it all, which is where I get value out of it.

That and confirming that it's exceptionally unsettling with what it produces when you bypass the nsfw filter. But it's also quite capricious with what it considers nsfw.

This really will be remembered as the beginning of something huge, though. Even as a tool it's close to something like the first FM synth.
posted by abulafa at 12:50 PM on September 4, 2022 [5 favorites]


Even as a tool it's close to something like the first FM synth.

That is an interesting comparison. I can think about, say, the early Larry Fast/Synergy albums which were full of sounds that weren't even trying to replicate actual instruments but instead were full of foreignness and celebrating that as something to be heard and acknowledged and seen as valid. But as time progressed synths became less SF/imaginary sounds and ever more and more like real instruments.

That could be the trajectory we see these tools take. I have no idea.
posted by hippybear at 1:02 PM on September 4, 2022 [5 favorites]


I just hate having to use discord as an interface
posted by Going To Maine at 1:44 PM on September 4, 2022 [7 favorites]


MidJourney has become my tool of choice over DALL-E and Dreamstudio largely because a monthly fee allows for unlimited image generation (at a certain point the render time is slowed down though) "Default" MidJourney tends to be more "painterly" than DALL-E which tries to be more photo-realistic without any style prompts. And these tools are still in their infancy-- it takes a lot of experimentation as soon as you leave the realm of simple portraiture or other image types that don't require a lot of composition.

I've been playing with these tools for about two months (self-link: Instagram / Twitter) and am trying hard not to become completely addicted.
posted by gwint at 1:47 PM on September 4, 2022 [2 favorites]


Oh, and you'll also notice that when you view highly composed image at actual size, where the AI is kind of fudging it-- like with the first scenic image in the article. If you don't focus in on details it looks amazing, but if you look at, say, the buildings, it is clear that the AI doesn't have the fidelity down to that level-- they almost look like too-compressed JPEGs. But this is likely a temporary situation until the models natively produce higher-res images with larger models.
posted by gwint at 1:53 PM on September 4, 2022 [1 favorite]


Thanks for this! I was just looking for something exactly like this for DALL-E this morning and found the DALL-E 2 Prompt Book, PDF linked here, really helpful.

What I am realizing I need now is probably some sort of visual reference for art styles. I know what oil paint looks like on canvas, or ink on paper, or bronzed sculpture, but there are so many things you can do with that.

I've been burning through DALL-E credits like crazy just getting a feel for how to work with something like this. Definitely not like collaborating with a traditional artist, who can converse with you to help narrow down possibilities, whether that's composition, art style, medium or content.

DALL-E says "try abstract ideas" but it turns out, mine are too abstract. For now, it's up to me to come up with the imagery I want to represent the story I'm trying to tell. Which, I guess, is good for creators? Right now we are all just making it do interesting things. We're just barely getting started on actually telling stories with it.

I've had to disabuse myself of the notion of the uniqueness of anything I can come up with, given the sheer onslaught of what you can generate in collaboration with an AI. It's unique, whether I make it, or the AI makes it. Anything I make takes time, time to gain experience, and time to craft the work. The AI can make four unique things, basically instantly, all day long.

That's intimidating, and liberating. As someone who cannot escape having a relentless torrent of ideas—far, far more ideas than I have ever had the time, talent or resources to implement—I am having a blast.

I cannot wait to see what everyone else will do with these things, because I know I'm not the only one.
posted by bigbigdog at 2:57 PM on September 4, 2022 [2 favorites]


I've found that the most interesting MJ images I get come from lyrics or nonspecific prompts plus time: multiple generations, upscales, then generations of those, and so on.

I rarely put in anything other than these vague phrases and some aspect ratios unless I'm working on one of MJ's themed threads or have a sudden desire to generate a vintage travel poster or jewelry design. Or, you know, just need more cat pictures in my life.

Speaking of cat pictures, thanks again to oulipian for introducing me to Midjourney via your project.
posted by pernoctalian at 3:20 PM on September 4, 2022 [3 favorites]


>> Even as a tool it's close to something like the first FM synth.

Yes, in fact the term "visual image synthesis" has been bandied about by a few people as a more appropriate term.

>> I've been burning through DALL-E credits like crazy just getting a feel for how to work with something like this.

I second @gwint's statement above: MidJourney has a much more lenient policy for paid accounts and there is a freedom of exploration there that I feel is missing in Dall-e because you're always keeping an eye on the credits being burned.

Dall-e still seems to be better at coherence, it seems to give you a more satisfying result more often than MJ. It seems better at understanding the context on the first go.

For example, I submitted the prompt "A film still of a 1970s housewife being surprised on the tv show Candid Camera"

Dall-e's result was pretty good, resulting in a wide range of styles that all nonetheless hit the mark. (Here's the version with men, in case you're curious)

MidJourney got the assignment, but a slightly lower grade in my book.

However, MidJourney has a new "photographic" style that has perhaps caught up with Dall-e in terms of pure technical marks.

>>I want one of these generators to let me create drop down selection windows at will, to keep track of and apply all the cool, useful descriptors.

This is *not* that, but it is a "Artist Influence" spreadsheet that some obsessive MidJourney user compiled. It shows you the results of applying a specific artist's style to a specific subject matter, so "Cartoon character by Ashley Wood".

I have to say that many of the results are visually stunning and that's the dilemma. While I'm generally open-minded and even optimistic about the uses of this technology, seeing this laundry list of contemporary artist whose styles can be appropriated with a few words gives me substantial pause.

It definitely puts test to the old phrase "Imitation is the sincerest form of flattery."
posted by jeremias at 4:03 PM on September 4, 2022 [2 favorites]


yeah, honestly as an artist I feel like these things are treading hard on "fair use", to such a degree that I think we really need to redefine what that is. I am also pretty sure the corporations building these have much larger legal teams than a bunch of artists could get together, so I doubt that's going to happen.
posted by egypturnash at 4:47 PM on September 4, 2022 [3 favorites]


An AI-Generated Artwork Won First Place at a State Fair Fine Arts Competition, and Artists Are Pissed

I wonder what's gonna happen with these kinds of tools and copyright... because the seed images can be from any image on the web, right? Including the copyrighted images? And you can tell midjourney to emulate the style of artists whose works are under copyright, too. It makes sense that anything midjourney makes has a noncommercial liscense and can be remixed by other midjourney users, but then they also charge a subscription fee...

Really wonder what will happen when some artist sees echos of their specific, copyrighted work in a collage created by midjourney and decides to try to raise hell about it.
posted by subdee at 4:51 PM on September 4, 2022


Well I spent most all day using up what constitutes a month's worth of basic tier usage. I now have quite a lot of interesting imagery to play with, thanks!
posted by otsebyatina at 4:52 PM on September 4, 2022




The art that is generated can be so striking and beautiful, in color choice, lighting, composition. It draws me in. It makes me want to look closer...

And then the part of my mind that wants to construct an impression of the world, take in the logic of the details, understand how it's put together, find the delightful little things I missed on broad overview gets to work and... it just slides off. Here and there I look, the details seem to make sense, but they never come together quite right, the vague ambiguities, lost edges that should be resolved by the surrounding context just flow continuously into new ambiguities and slowly viewing starts to become uncomfortable. Nothing quite makes sense. It's not even the mistakes, the extra lapel, the thumb that clearly cannot be attached where a thumb should be. Okay the silhouette that you slowly realize has 3 legs instead of two is a little worrying in and of itself.

The slowly dawning realization that all this is chaos and nonsense and madness starts to creep over me. That what I am looking at is something far more sinister and inhuman it appeared at first glance. A great, unknown entity disguising itself, camouflaging itself as something delightful. And it has drawn me in...

And then I pull my eyes away and escape and go back to reading the comments.

But a feeling of unease settles over me. All this praise. This sharing. And no mention of this... feeling. Does no one else see the same? Surely taking a handful of art classes and practicing seeing things and making sense of them has not altered my perceptions so much. Do others really not notice. Am I going mad?

And the further thought occurs. Some day this thing will be enough to fool even me. To satisfy my senses that what I am looking at is moving and delightful. But just beyond what I can perceive, what I can recognize, I know that visual gibbering madness will be lurking, waiting. But I will not recognize it anymore. Some stalking beast will have found it's prey.
posted by Zalzidrax at 5:05 PM on September 4, 2022 [21 favorites]


Zalzidrax, I find the hauntingness diminishes drastically the more experience I get with the operating parameters and begin noticing the gaps. The slightly dreamlike collage qualities are frustrating more than disturbing because they represent a miss on my specific prompt or use case.

But consider this: mp3 files were able to achieve significant compression advantages through psycho-acoustic masking. That is, stripping out the sound information that, thanks to how human bodies and brains perceive sounds, could never be perceived. The latent diffusion approach used in this visual synthesis (I much prefer that term) is a flavor of "compressing" (actually encoding but it doesn't matter the difference) the huge random image space into dimensions that can realistically be manipulated and tested against matching the specified prompt. So in a way, it's subtracting information that might be unnecessary for the perception of "that is a tree" to still be successful, and then "decoding" that representation into a more photographic/higher resolution representation... without replacing any of that missing information. So patterns like buildings or structures aren't continuous like they might be in a real three dimensional environment. Or there are too many eyes and mouths (visually high-frequency shapes tend to get over-sampled in training and thus get over-represented in synthesis).

The absolutely mind blowing part of this is your eyes and brain are doing a very similar information filtering scheme - the things you might visually perceive as continuous or solid or anchored relative to your visual senses are patchworks of the last many seconds and minutes of perceptions all stitched together in a form that feels continuous and self-consistent but isn't so much sampling like a movie projector as desperately patching together visual cues in just enough detail that we don't die.

The representation of a moment in a picture or worse yet a photograph is perversely wasteful from a neuroperceptual stance - it flattens all that cloud of depth and proprioception into shallow light bounces

So take heart. It's not that visual synthesis misses detail, it's that your brain misses other detail and it probably feels nauseating for the filtered detail to not align.
posted by abulafa at 5:58 PM on September 4, 2022 [3 favorites]


I managed to get stable diffusion running on a box in the basement (with 7-year old hardware! It's slow, but it works!), and I've been enjoying playing with the weird edges of the model. For example: if you have common spelling mistakes in your prompts, you'll start getting more amateurish results. You'll also start seeing asemic signatures in the corners, because it's now calling on a lot more fan art, which almost always has a visible signature (to combat uncredited use, which feels very on the nose here). It's also neat to put in a very common, visually specific prompt and see the mutants appear. "Mona Lisa" will get you, well, the Mona Lisa, but it also lets you know how much the model knows about the Mona Lisa, and how much more or less it knows about the Mona Lisa than, say, Warhol's soup cans. Basically, you get to wander around inside a computer's brain and I am here for it. (And of course running a model locally gets you the Forbidden Prompts.)
posted by phooky at 6:33 PM on September 4, 2022 [7 favorites]


The Forbidden Prompts is now up for sockpuppet claiming.
posted by hippybear at 6:42 PM on September 4, 2022 [1 favorite]


With Stable Diffusion, you will get rickrolls frequently because of the "nsfw" filter. You can bypass this.

If, like me, you are doing this on a mobile GPU with limited memory, you can run an optimised version that is a little slower, but generates larger resolutions.

Me and my daughter spent today playing with this and finally she understands my old stories about long it could take to compile in the olden times. She's using it to generate avatars for her D&D character and to prototype characters for a web comic she's been doing for a while. It's exciting. I recall doing generative music and visuals (Cthugha!) In the 1990s and this has the same feel.
posted by meehawl at 6:51 PM on September 4, 2022 [6 favorites]


I started into Midjourney for table top RPGs and it's superb for that relaxed and low stakes high imagination application.

Disabling the nsfw filter and really digging in to how unsettlingly bad the nsfw image generation looks actually gave me some optimism. Extracting the notion of multiple human bodies interacting is exceptionally tough, and the filthy images you can produce are also prone to way too many limbs, genderless hyper-genitalia and abbreviated facial and somatic postures that feel more Aphex Twin than Hegre are common. Eventually it will overcome those limitations but we aren't quite at the point where visual synthesis of lizard brain porno stunts a few generations of humans yet.

I'm guessing we hit that around 2026.
posted by abulafa at 7:53 PM on September 4, 2022 [2 favorites]


>An AI-Generated Artwork Won First Place at a State Fair Fine Arts Competition, and Artists Are Pissed

If you're curious, the high definition image is gorgeous, here's direct link to the JPG. I'd easily use it as a wallpaper.

Is it copyright infringement?

U.S. Copyright Office Rules A.I. Art Can’t Be Copyrighted
posted by xdvesper at 8:31 PM on September 4, 2022


Xdvesper, as I understand it... What that means in practice is that the copyright of an AI- created work belongs to the human who "creates" the art by using a prompt, not to the AI itself or the AI's programmer.
posted by Green Winnebago at 8:50 PM on September 4, 2022


> This really will be remembered as the beginning of something huge

haha that's one way to put it yes
posted by glonous keming at 9:30 PM on September 4, 2022


Thanks, pernoctalian! I post something on autoexec.cat every day, if anyone wants to follow along.
posted by oulipian at 8:29 AM on September 5, 2022 [2 favorites]


we aren't quite at the point where visual synthesis of lizard brain porno stunts a few generations of humans yet

I think you vastly underestimate how generously humans will interpret images as pornorgraphy. Our children will grow up with kinks that are not just biologically implausible, but geometrically impossible.
posted by phooky at 8:39 AM on September 5, 2022 [4 favorites]


Really wonder what will happen when some artist sees echos of their specific, copyrighted work in a collage created by midjourney and decides to try to raise hell about it.


There are already tons of artists who have been complaining about their art being obviously lifted without permission or pay in these AI art generators. And people becoming severely discouraged from making art because of it--not dissimilar to those who have quit due to other forms of piracy over the years (eg reposting on social media without credit or putting stuff on merch without paying royalties). And the reason why many of MJ's default art looks like video game concept art is because a large quantity is taken from exactly that. Unfortunately people don't really think of commission artists as actual people with livelihoods. Though ironically, if artists were to quit and/or lose their clientele, there won't be any more art for AI art generators to steal from. 🙃
posted by picklenickle at 3:11 AM on September 7, 2022


Eventually we'll have some legal requirements for identifying art produced by AI. I'm not talking about a simple disclosure such as, "This was created by an AI," though I think that will probably be part of it.

Disclosing the prompt would enable artists to determine if an illustration was deliberately designed to ape their style.

I do like the idea of a Harlan Ellison-type artist going all, "fuck you, pay me." Suing individual artists for their collaboration with an AI will be time-consuming. But I think you can make a definite argument in front of a judge that if someone prompted the AI to duplicate your style, that counts as a derivative work.

Perhaps we'll end up with a statutory license, perhaps something like ASCAP. But it'll more likely be corporations like Disney negotiating one-on-one with companies like OpenAI. It'll be a tax on creativity, extracted directly from the credits paid for image generation, the same way you get a bunch of channels you don't want with your cable bill.
posted by bigbigdog at 1:05 PM on September 7, 2022


It isn't just the "style" being duplicated, though, it's the fact that artists' actual images that they did manual labor to produce were used to feed the AI, and the AI is a paid product utilizing those original images. And this is without the original artists' consent or knowledge, much less royalties and such. Most artists I know are not concerned with targeting individual people generating images but rather at the creators of the AI software.
posted by picklenickle at 5:44 PM on September 7, 2022


« Older Going for the PEGOT   |   The Portuguese Dreyfus Newer »


This thread has been archived and is closed to new comments