Using song lyrics as AI image prompts
August 26, 2022 1:18 PM   Subscribe

SolarProphet is a youtube creator who's recently been on a kick of "[song] but every lyric is an AI generated image". A fun example of human creativity with AI images. ("Ultimate Showdown of Ultimate Destiny" is possibly my favourite.) SLYTChannel.
posted by Shark Hat (25 comments total) 11 users marked this as a favorite
 
Paranoid Android please.
posted by Abehammerb Lincoln at 1:30 PM on August 26, 2022 [2 favorites]


I was really, really depressed about this AI art stuff, it seemed like it was gonna be the end of art made by human beings, but recently I did some experimenting with Midjourney and it's a lot more janky than I expected. I kept telling it to create a picture of a hand petting a cat and no matter how I phrased it the results were kind of monstrous, with cats that had fingers for legs, or hands with cat faces embedded in their palms. I got a lot of freaky spider-cats. These were decent illustrations in their own right and they could have worked as the covers of horror novels or something, but they were miles from what I was asking for. It was simultaneously astonishing what the software could do and hilarious how badly it could fuck up the simplest things.

I do think this software is really ominous and kids in art school now are probably shit out of luck. But all these jaw-dropping examples of AI-generated art we're seeing now represent the very best the software can do, and we're not seeing the millions of nightmare blob monsters being generated every day. Either that, or there's just something about a hand petting a cat that makes Midjourney go absolutely nuts.
posted by Ursula Hitler at 2:58 PM on August 26, 2022 [4 favorites]


My request is Two Ton Boa's "Comin' Up From Behind", which I think has good lyrical imagery and would fit in with the songs the channel has done so far.
posted by Catblack at 3:05 PM on August 26, 2022


I kept telling it to create a picture of a hand petting a cat and no matter how I phrased it the results were kind of monstrous

I just tried it with Dall-E and it produced four basically photorealistic results (prompt was "a hand petting a cat, closeup color photograph, shallow depth of field, 50mm").

Dall-E generally has a much better grasp of language than Midjourney and can handle combining several things or ideas. It still often messes hands up though!

I must say I also find this whole thing- and the potential for these things- make me somewhat queasy. In a way that, for whatever reason, porn deepfakes didn't, probably because those got thoroughly kicked off the reputable parts of the web. But this stuff is going to be big business and feels important in a way that deepfakes didn't, even if deepfakes are more in-your-face immoral.
posted by BungaDunga at 3:20 PM on August 26, 2022 [2 favorites]


Yeah, those results are much, much better than I got from Midjourney. So, I guess art made by humans is over.
posted by Ursula Hitler at 3:28 PM on August 26, 2022


I want them to stop asking the AI to smile at me, because that is going nowhere fast.
posted by GenjiandProust at 4:17 PM on August 26, 2022 [1 favorite]


Ursula Hitler: "I do think this software is really ominous and kids in art school now are probably shit out of luck. "

Art school art hasn't been about actual skill at realistic representation for a while now.
posted by signal at 5:44 PM on August 26, 2022 [3 favorites]


I must be in the First Class section of this particular handbasket to Hell, because I’m really savoring the creepy Uncanny Valley-ness of it all. Thanks for sharing this!

I kind of wish the AI were clumsier, like Dall E Mini when I ask it to draw Rudy Giuliani stuck in a ballpit, but I guess the eerie sophistication is sort of the point.
posted by armeowda at 6:04 PM on August 26, 2022 [1 favorite]


Art school art hasn't been about actual skill at realistic representation for a while now.

I was thinking of the kids who are going to school for illustration, graphic design, etc. Most of the kids going to school with hopes of becoming gallery artists have pretty much always been shit out of luck.

I see drawing tutorials on Youtube now, stuff created by talented artists, and it all just seems so sad and futile to me. There's truly no point in anybody spending years learning to draw now. Our planet is dying, but just in time our finest minds have figured out how to make human art obsolete!

Why does everything in the 21st Century have to feel like it was dreamed up by Philip K. Dick?
posted by Ursula Hitler at 9:46 PM on August 26, 2022 [4 favorites]


I see drawing tutorials on Youtube now, stuff created by talented artists, and it all just seems so sad and futile to me. There's truly no point in anybody spending years learning to draw now. Our planet is dying, but just in time our finest minds have figured out how to make human art obsolete!

I made the mistake of clicking on this post, and ended up doom-scrolling for the next hour and a half. The odd thing is that apart from some pretty big names on Twitter absolutely blasting AI art (Karla Ortiz has been very out-spoken on copyright infringement issues), the illustration circles I hang out with on Discord/IRL have been pretty blasé on the advent of AI art. I don't know if it is a collective attempt at self-preservation through denial, or if they know something else that I don't.

For my part I've been alternating between despair and trying to think of what skills I can pick up next to earn a living, and that wouldn't be automated away by the time I become proficient in it, and just chugging along trying not to think about it. At this moment I still have illustration work coming in, but it's hard not to feel hopeless.

What has been particularly grating has been how gleeful some of the techbros hawking AI art are about cutting artists out of the process of image-making - artists have already routinely been undervalued and underpaid even before AI art was conceived of, so why is this seen as a great boon? Why couldn't AI have been put to automating Wall St bankers out of the equation instead? More money will be saved in the latter case, at the very least.
posted by seapig at 3:47 AM on August 27, 2022 [1 favorite]


I was thinking of the kids who are going to school for illustration, graphic design, etc.

Maybe, though my sister-in-law is a graphic designer and has been spending the last month and a half tweaking a design for a big client that isn't quite satisfied with the way a stylised M divides their brand name (i.e. the want the stylized M, but they don't want it to break the name in half...). My guess is that AI is going to take some time before it can do what fussy clients pay the big bucks to graphic designers to accomplish.
posted by nangua at 4:21 AM on August 27, 2022 [4 favorites]


What nangua said. AI art is good (and improving) for the kind of starting-point conceptual art that takes one opinion of a description and renders it. It's a machine opinion and the human involved still needs to tweak language and parameters and weights to get something that captures a concept - but just barely.

To develop that concept into exactly what a client needs, or to imagine new representations for the concepts that don't recapitulate everyone else's understood representations, that is still very much a crucial skill.

Photoshop and image brushes and tree filters are tools where the final product is mediated and enabled by the technical tool. ML based art is another tool.

It could be seen to eat in to the commodity spec conceptual space, but those were often people who chose not to afford to engage expensive artists.

In future, if you want novelty or departure from what will probably quickly become tiresome norms (the slightly amateurish oversaturated color palettes and one-size-fits -all representations of concepts like "hand" or "woman" yielded by midjourney for instance) will proliferate and need a counterbalance of human creativity.

I say this as someone who works in the field. The AI will get better and better at that nuance and spark, but right now it still requires enormous human curation and many tries to get something you want. The good part is it also helps guide the human's creativity by enabling quick iterations of concepts still being developed.

Art careers are no more dead now than they were at the rise of Photoshop, because taste and aesthetic sense are unevenly distributed even if ability to render and skills to do so quickly and competently become commodities.

Don't give up, it's a tool you can use. You probably weren't going to be paid enough to tolerate the kind of client who would be satisfied with a first-pass generated concept art piece while keeping your sanity anyway. Let them quickly reach the limits of their commodity recapitulation of training data and then pay what your time is worth to get them what they actually want.

On the other hand, porn is about to get very complicated.
posted by abulafa at 6:13 AM on August 27, 2022 [3 favorites]


Yeah, someone's made an effort toward generating porn. The results are, well, unsettling (NSFW. Obviously). I think it's using Stable Diffusion but with the porn filter that Midjourney uses turned off.
posted by BungaDunga at 8:34 AM on August 27, 2022


Hippopotsunami... First they took back the Nile, then they flooded the Mediterranean and now they're being being launched towards every coastal skyscraper by chance Deep Sea eruptions with a vengeance!
posted by flamewise at 1:03 PM on August 27, 2022


abulafa: "Art careers are no more dead now than they were at the rise of Photoshop, because taste and aesthetic sense are unevenly distributed "

This. It's like saying back when desktop publishing started to become a thing that designers were at risk because everybody had access to all these different fonts, and then look at your standard, non-designer made PowerPoint presentation or newsletter.
posted by signal at 3:25 PM on August 27, 2022 [3 favorites]


I was thinking of the kids who are going to school for illustration, graphic design, etc.

My brother went to Ringling School of Art and Design in the 90s and got a degree in illustration. He works at Trader Joe's, along with his wife who has a PhD in history.

Just saying.
posted by Foosnark at 3:47 PM on August 27, 2022 [1 favorite]


A couple things: AI art is already good enough to replace a lot of artists. AI art is not currently copyrightable (at least in the US), which means you need a human if you want to own the copyright or have exclusive rights to art, so that's good for humans working in the commercial art industry, and it's also good for people who want to make memes on the internet who maybe don't care about copyright so much. I think that may become a real issue really fast, though, because there will be people who want to change the current law.

For artists working in traditional media, and selling basically unique works to other humans, AI art may be depressing, but I think the model is closer to chess. Computers have been able to beat humans for years, but people still care about the best human, and they still play chess same as ever. People like art made by other people. They like the perspective of another human. I don't think AI art will ever be as valuable, at least until AIs exist that are able to have a real perspective.

If you look at everyday illustration from the days before everything was digital, you might be impressed by how good everyone was, or if you look at enough illustration, you might be impressed by how bad it could be. The rise of digital art raised the bar in a way that has the potential to be depressing for art students these days, though I doubt many of them would be depressed by that. It removed the need to wrestle with the media, wrestle with pre-digital reproduction technology (Printing today is crazy good. I don't even remember the last time I saw out-of-register color separations, and I sometimes see young people on the internet talking about halftone screens like they are buggy-whips or rotary telephones).

AI art is scary, but maybe it's not the art Armageddon. It seems to me like it's headed for replacing royalty-free art, and maybe stock art. My happy scenario is that it kills Getty Images dead and replaces it with a sort of free-for-all. My nightmare scenario is that AI art companies become Getty Images 2.0, laying waste to all creativity, and striding like monsters through the ruins of civilization.
posted by surlyben at 8:53 PM on August 27, 2022 [3 favorites]


Oh, also, to comment on the original topic, that's a good example of of using AI as a tool for making art, and the kind of thing that AI art free-for-all should lead to more of.
posted by surlyben at 8:58 PM on August 27, 2022 [2 favorites]


The examples in this post prove how inadequate it is to just throw AI-generated images onto the screen. I watched these frustrated with how much wasn't done, thinking that we'll surely see some people realizing this idea/method much more fully.

What I have in mind are things like refining prompts to get transitional images, and cinematic framing and editing techniques to make the visuals be more narrative. A filmmaker doing this work would create much more compelling music videos — the AI-generated images are only a part of the whole.

I don't doubt that we'll soon see these AIs generating images with subject and thematic continuity, as well as actual moving images.

And to the wider issue, I'm sure that we'll quickly see the natural language prompts augmented by a variety of more specific tools. Right now, as I myself know from experience, there's already a kind of creative technique just in crafting productive prompts — people are assuming that these techniques will be explored and elaborated but, really, there's nothing preventing the interface from being more specific and granular. Researchers have already figured out how to identify generated entities and re-use them in further prompts — this is outsiders reverse-engineering stuff that could be explicitly exposed by design. (I don't mean to imply that the model designers wouldn't have to explore implicit stuff themselves to make it explicit and available.)

I remain an AGI curmudgeon. That said, I am extremely impressed and excited at what these neural nets are able to do within their specific domains. It makes me keenly wish I were thirty years younger. As a layperson, I'm following the developments in this field closely.
posted by Ivan Fyodorovich at 10:40 PM on August 27, 2022


things like refining prompts to get transitional images

I’m hoping the current vogue for prompts is just an artifact of the way the training data is set up, and an attempt at accessibility, and not a semi unintentional way of disguising that, actually, we don’t have as much control of these generation models as we might like, because text seems great for broad concepts, but quickly becomes terrible for more detailed control. I hope a more conventional visual interface is possible for when you do want to give a lot more detailed directions to the AI.
posted by Jon Mitchell at 11:33 PM on August 27, 2022


Well either that or it's an artifact of it seeming sexier to try to make an AI to "replace humans" than it is to use AI to make a tool to empower humans.

Which is all kinds of messed up.
posted by Zalzidrax at 1:03 AM on August 28, 2022 [1 favorite]


The natural language prompts are (at this point) because these are multimodal, using large language models coupled to generative models trained on caption/image pairs. This isn't just an unnecessarily ambiguous but flashy interface (even though it may seem so to the user) but rather it's where a huge amount of the semantic content resides.

Put differently, the caption/image model learns to generalize about the images it's trained on, also using the captioning as hints as to what's "there" as expressed via language tokens, and then the captions become the interface with the language model. The large language model can be seen as a repository of "knowledge" about the world as it's expressed in language; that "knowledge" is leveraged via the image captions which gives access to the "knowledge" implicit in the training images (and vice-versa).

The whole is in a sense greater than the sum of the parts: there's the "knowledge" of the "real world" implicit in how language is used in the vast written text of the training data used to train the LLM, and there's the "knowledge" of the "real world" implicit in the vast artwork and photography used to train the image model, and then there's a limited amount of implicit knowledge in the image captions which acts as a narrow bridge between these two domains. Combined into a multimodal AI that synergistically uses them, you end up with generated images that both obey the implicit rules of the images it was trained with and which have a more global semantic content derived from the large language model. In short, the images make more "sense" and the user interface explicitly presents this language-centric sensibility.

A large language model is the most easily available and powerful way to make a large image model semantically "smarter". But it's not the only way; and doing so with an LLM has vices that come with its virtues. I think multimodal models are going to be the inflection point of this stage of AI development; there are other modalities that could be integrated to something like this to make it more powerful or more fit-to-purpose

That said, natural language is most likely not the best primary interface to these tools — certainly being the only interface is restrictive and unfriendly.

The researchers developing these tools can provide other interfaces by discovering how to tokenize image properties and making such tokens directly available for manipulation from the user interface. Right now, we're using these tools in their native interface, not ours — which sounds really strange given that natural language is generally our own native interface. But it's not really the visual artist's native interface: theirs are a suite of discrete tools with well-understood functions. At present, an example of this in DALL-E 2 is editing (erasing) and in-painting. That's pretty rudimentary compared to what is possible. But these are research tools that are barely even in the alpha development stage of a possible future shipping product.
posted by Ivan Fyodorovich at 4:02 AM on August 28, 2022 [1 favorite]


AI art is not currently copyrightable (at least in the US)

OpenAI claims copyright on the images it creates, fwiw. Obviously this hasn't been tested directly in court yet.
posted by BungaDunga at 10:39 AM on August 28, 2022 [1 favorite]


It is no surprise that they would claim copyright, though that's pretty evil. I believe that the copyright office disagrees, based on a 2019 case. (this Smithsonian article always shows up when I Google it.)

It is also totally unclear whether moral rights apply. Do I have to give attribution? To whom? Who knows?

Certainly the company that owns an AI can bind users to the terms for using the AI, so Open AI can stop people from using it to directly make porn or violent images or whatever, but those terms don't apply to third parties.
posted by surlyben at 12:18 PM on August 28, 2022


That case stands for the proposition that an AI system can't hold copyright. But if I design a generative art program and fiddle with the parameters until it produces something I think looks good and I print it out, that fixed image is definitely copyright me.

These networks trained on bajillions of obviously copyrighted work are a different thing, since the inputs are stuff that are copyright someone else. So that's a different wrinkle. But generally if I have some sort of generative art program and use it to produce a movie, or an image, or whatever, it's copyright me. The human effort in producing Dall-E pictures is in the prompt engineering and picking the variation you like the best- not that different from taking a much simpler generative algorithm and playing with the parameters until it looks good.
posted by BungaDunga at 6:07 PM on August 28, 2022 [2 favorites]


« Older Fiat divisa panem   |   "Why does that mushroom sound like Strong Bad?" Newer »


This thread has been archived and is closed to new comments