Any sufficiently advanced technology is indistinguishable from magic
April 15, 2022 3:01 PM   Subscribe

Raccoon astronaut with the cosmos reflected in his helmet dreams of the stars / Jedi sloth / Lego Mona Lisa / Cute animals on rainbow grass / Bulldog in coat and hat drives an old car / Victorian rabbit reads the paper on a bench / Macro shot of a kitten in glasses / Cool panda skateboards in Santa Monica / Propaganda poster of a Napoleon cat with cheese / Plants in a lightbulb / Proud raccoons pose with their art / Studio Ghibli train stations / Kid and dog stare at the stars / Ukiyo-e teddy bears shop for groceries / Soup bowl monster knit out of wool / Astronaut on a horse, pencil sketch / American Gothic, but it's dogs with pizzas / Badass sheep in a science lab / Mona Lisa in Twin Peaks / Kitty donut shop / Colorful gamerooms, Memphis Design / HD photo of Pikachu in a cape / Wooden art deco cat / Fruit golem / Codex Seraphinianus / Voynich Manuscript / Variations on Vermeer, Klimt, Seurat, Ohara / "Good morning" Post-It on the ISS cupola / Cats in blue hats / Writer ponders her next story, oil painting / Timepieces, De Chirico style / Cheshire Cat and Tinkerbell play poker / Pieter Bruegel's Incredible Hulk / A plum and perfume served in a hat / Earth as chocolate cake / The orange cat Otto von Garfield in a Prussian Pickelhaube eats lasagna / A robot paints while playing piano, draws itself, paints itself, shows another robot its art / Meet OpenAI's DALL·E 2, the extraordinary new AI that creates anything you can imagine in a matter of seconds.

OpenAI's introductory video (2:47)

How does DALL·E 2 actually work? - 10-minute video explainer

How OpenAI's DALL-E 2 works explained at the level an average 15-year-old might understand

DALL-E 2, the future of AI research, and OpenAI’s business model

More generally: CGPGrey: How Machines Learn
A look at the actual interface: "This is frickin' incredible... the stuff of dreams"

43-minute real-time DALL·E 2 demo from Karen X. Cheng (starts after 27 seconds)

Playing with DALL·E 2 - an article with copious images

OpenAI's discussion of risk assessments, limitations, and strategies

Join the waitlist (100,000+ and growing); wider release is tentatively expected sometime this summer

Subreddits for browsing: /r/DALLE2 and /r/MediaSynthesis, or follow @OpenAIDALLE on Instagram

Don't miss Tyna Eloundou's delightful Twitter thread (unrolled)
Other projects from OpenAI's incredible portfolio:
GPT-3 generates humanlike text in response to prompts, powering book summarization, NPC dialogue, immersive games like AI Dungeon (previously), and countless other projects; perhaps best explored by Gwern Branwen's longform essay "GPT-3 Creative Fiction" (previously). The API is now publicly available.

AI Jukebox can make surreal, authentic-sounding music and vocals (7000+ samples; previously)

Codex and CoPilot help anyone write working programming code from natural language descriptions, such as this space game created in under 10 minutes

From last January: DALL·E version 1 (previously on MeFi), which look more like low-res shrinky-dinks in retrospect; the related CLIP model has powered an explosion in generated art (previously)
A round-up of other AI breakthroughs:
GAN Theft Auto: Playing a Neural Network's version of GTA V

Some cutting-edge demos from NVIDIA: Create Beautiful Images From Sketches - GANCraft transfers photorealistic styles to simplified landscapes - Face Generator AI uses sliders to change facial features on the fly

An AI-powered spoken language translator

A Universal Music Translation Network: transfer the melody of any music to any other instrument or style

DeepMind develops a superhuman-level Quake 3 AI team

Facebook AI generates novel, coherent speech trained only on audiobook samples

Artistic style transfer for videos

Alias-free GANs enable ultra-smooth motion within generated imagery
...and creative fun:
"Can you name one object in this photo?"

Artbreeder lets you harness AI to iterate and cross-pollinate visual art

Exploring an AI's latent space in morphing, dreamlike videos: Album Art - Four Rooms - Cars - Architecture - Cats - Flowers - Landscapes - Anime - Futurama - 1000 other categories

Unbelievable deepfakes by Ctrl Shift Face:
The Shining starring Jim Carrey - Bruce Lee's Matrix - meanwhile, Neo Takes the Blue Pill - Terminator 2 starring Sylvester Stallone (or maybe you'd prefer Home Stallone?) - meanwhile, Arnie stars in No Country for Old Men - Willem Dafoe is Ace Ventura - Tom Cruise is an American Psycho - Tom Selleck is Indiana Jones - Better Call Trump: Money Laundering 101 - The Dark Knight's Tale - Bill Hader channels Schwarzenegger, Pacino, and Cruise
This [X] Does Not Exist:
Person - Cat - Horse - Fursona - Anime - Word - Song Lyric - MP - Baseball Player - Automobile - City - Map - Flag - Rental - Food Blog - Night Sky - Sneaker - Music Video - Beach - Job Interview - Iris and bring vocal synthesis to the masses, enabling sublime silliness like Spongebob covering "Where is the Love" (among myriad other songs)

Dreamland, a cute and somewhat unsettling example of "infinite zoom" generated images

Story2Hallucination AI visualizes Radiohead's "Glass Eyes" in the style of Monet

"Steamed Hams", but it's Trump and Obama

A city from 1800 to 2100 - VQGAN+CLIP timelapse

An exercise in guided creativity: Tour of the Sacred Library
From 2019: The Coming Age of Imaginative Machines: If you aren't following the rise of synthetic media, the 2020s will hit you like a digital blitzkrieg
posted by Rhaomi (87 comments total) 128 users marked this as a favorite
posted by BlunderingArtist at 3:07 PM on April 15, 2022 [2 favorites]

*fetches Jawa dropcloth
posted by clavdivs at 3:13 PM on April 15, 2022

I, for one, welcome the new AI-powered weaponized memes of mass destruction.
posted by wierdo at 3:24 PM on April 15, 2022 [2 favorites]

I want to make Dall-E 2 suggestions:

Stephen Urkel Master of the Universe
Sinking of the Medusa but all the people are John Cena
Calvin and Hobbes drawn by Albrecht Durer
The hallucinogenic Torreador but sober
Abba vs the jedi knights
The Last Supper featuring Elvis Presley and Jimi Hendrix
posted by Abehammerb Lincoln at 3:24 PM on April 15, 2022 [11 favorites]

This is all superb from a theoretical and technological standpoint, but as someone with an illustration degree who hopes to get work at some point in the future: please, for the love of god, stop this at once.
posted by fight or flight at 3:25 PM on April 15, 2022 [32 favorites]

We are gonna kill all the commercial artists.
posted by Going To Maine at 3:31 PM on April 15, 2022 [2 favorites]

Like, getting real Butlerian Jihad vibes.
posted by Going To Maine at 3:33 PM on April 15, 2022 [19 favorites]

Also for a (much needed, imo) critical angle, I've been following Spike Trotman's tweets about DALL-E for a while. Spike is a very successful publisher of graphic novels, comics and an artist herself, so she knows her shit when it comes to the threats to independent artists (spoilers: she is not thrilled at all).

The thing that concerns me is that DALL-E is scraping the web and picking up copyrighted works or really any art it can finds, but its output is not actually subject to copyright. The internet is already arguably the worst thing that's happened to a lot of independent artists (if you get successful enough, you can post something online and count on it being slapped on t-shirts by spam bots within minutes, and that's before we get into how people seem to think it's okay to regularly steal and repost art without credit). It would suck majorly to have your work stuck into this machine and churned out and used as advertising without your knowledge, robbing you of any ability to fight for control over what you've produced.
posted by fight or flight at 3:34 PM on April 15, 2022 [25 favorites]

Do you mean that its output is not eligible for copyright, or is not copyright infringement?
posted by snuffleupagus at 3:37 PM on April 15, 2022

Think I’ll quit my job and go live in the woods now.
posted by Going To Maine at 3:38 PM on April 15, 2022 [4 favorites]

I too, would love to hear what an intellectual property expert thinks the copyright status of these images will be.
posted by grokus at 3:39 PM on April 15, 2022 [1 favorite]

Do you mean that its output is not eligible for copyright, or is not copyright infringement?

Any work created by a machine is not (currently) eligible for copyright in most places. Copyright still requires a human to be involved somewhere.

Here's more about that as it pertains to the US:
In 2018, Steven Thaler filed an application to register a copyright for a computer-generated image created autonomously by a computer algorithm, the “Creativity Machine”. (We previously wrote about Mr. Thaler’s unsuccessful attempt to obtain a patent naming an AI machine as the inventor.) The image, entitled “A Recent Entrance to Paradise,” represents a “simulated near-death experience” in which an algorithm reprocesses pictures to create hallucinatory images in creating a fictional narrative about the afterlife.

The application was refused by the U.S. Copyright Office Review Board because it lacked human authorship. After receiving a second request for reconsideration from Thaler, the Board confirmed on February 14, 2022 that human authorship is a prerequisite to copyright protection in the United States and that the image could therefore not be registered.
I believe other countries have followed suit.
posted by fight or flight at 3:40 PM on April 15, 2022

I agree.

The thing that concerns me is that DALL-E is scraping the web and picking up copyrighted works or really any art it can finds, but its output is not actually subject to copyright.

That AI output is not protectable doesn't mean that using this software to generate commercial derivative works, such as advertising, can't be copyright infringement -- especially where the original art is incorporated in a recognizable way. The foggier cases would be AI trained on your art style to produce art that looks like yours.
posted by snuffleupagus at 3:44 PM on April 15, 2022 [4 favorites]

That doesn't mean that using this software to generate commercial derivative works, such as advertising, can't be copyright infringement -- especially where the original art is incorporated in a recognizable way.

Yep, though it would be easier said than done, especially if your art isn't wholly used or is just interpreted by the AI. This is definitely going to come up more and more over the next few years, so it'll be interesting (read: probably depressing) to see where the law ends up -- it's really asking the question of "is an AI able to claim authorship" and asking all sorts of questions about personhood that are.. well, going to be contentious.

Here's more about robot art and copyright law: The UK copyright implications of artificial intelligence generated art

And the results of a UK government consultation on the matter.
posted by fight or flight at 3:49 PM on April 15, 2022 [2 favorites]

Here's where it gets even more complicated, in terms of copyright/intellectual ownership: examples of an AI trained to replicate and create artworks based on the style of famous artists.

You can't copyright your style of work, but if, in the future, an AI can just create art that looks exactly like yours, quicker and cheaper than you could ever do it.. what are you supposed to do about that?
posted by fight or flight at 3:54 PM on April 15, 2022

Also check out @ai_curio_bot (WARNING: occasional body horror) that sometimes takes requests, see twitter bio. Its authors retooled it very quickly (like, a day?) to take advantage of the DALL-E diffusion method and now it understands scene composition and perspective a whole lot better.
posted by credulous at 3:58 PM on April 15, 2022 [1 favorite]

Daniel J. Gervais, "AI Derivatives: the Application to the Derivative Work Right to Literary and Artistic Productions of AI Machines," Seton Hall Law Review, Vol. 53, 2022.
This Article predicts that there will be attempts to use courts to try to broaden the derivative work right in litigation either to prevent the use of, or claim protection for, literary and artistic productions made by Artificial Intelligence (AI) machines. The Article considers the normative valence and the (significant) doctrinal pitfalls associated with such attempts. It also considers a possible legislative alternative, namely attempts to introduce a new sui generis right in AI productions. Finally, the Article explains how, whether such attempts succeed or not, the debate on rights (if any) in productions made by AI machines is distinct from the debate on text and data mining exceptions.

Shlomit Yanisky-Ravid & Luis A. Velez- Hernandez, "Copyrightability of Artworks Produced by Creative Robots and Originality: The Formality-Objective Model," 19 MINN. J.L. SCI. & TECH. 1 (2018).
In our new era of advanced technology, creative robots, driven by sophisticated artificial intelligence systems and hence, acting autonomously, are capable of producing innovative artistic and other works — ones which, had they been created by humans, would be eligible for copyright protection. This article addresses the question of copyrightability of artworks created by creative robots. We argue that creative robots as autonomous entities are capable of holding copyrights in art works they produce. However, the greatest hurdle to this notion, from the copyright law point of view, is the interpretation of the most important concept in copyright protection: the requirement of originality. Therefore, in this article we are revisiting the concept of originality. This article argues that confronting the challenges of the new 3A era (advanced, automated and autonomy), calls for a reassessment of the meaning of originality, which is undefined in the field of copyright protection both in international law, under the Berne Convention, and in domestic U.S. copyright law. The lack of a clear definition means that the existing concept of originality is inadequate for addressing new, possibly “copyrightable” works produced by creative robots. Moreover, the lack of a clear definition leads to interpretations of “originality” that are vague, immeasurable, and disharmonized, does already cause confusion in the industry as well as the public. This uncertainty surrounding an important legal concept has triggered a search for a solution that could eliminate or, at the very least, reduce future conflicts. This article suggests that a more formal, objective approach — as opposed to the existing, subjective (or mixed) approach — to the concept of originality should be adopted. The proposed objective approach might be applicable to works created by creative robots as well as artworks generated by digital tools and is further warranted by the intangible, vague nature of art. We suggest that a consistent legal determination of the question “what is original?” can successfully be achieved only by a formal, objective, descriptive approach.

You can't copyright your style of work, but if, in the future, an AI can just create art that looks exactly like yours, quicker and cheaper than you could ever do it.. what are you supposed to do about that?

Right. This is the hard one to answer. Especially if AI-generated works do become protectable (which arguably they are in some forms now, despite the basic doctrine -- video games with procedural generation, for instance).
posted by snuffleupagus at 4:00 PM on April 15, 2022 [2 favorites]

My immediate thought is that this is going to put stock photography and illustration sellers out of business.
posted by rivenwanderer at 4:00 PM on April 15, 2022 [1 favorite]

And we've now basically solved shitposting so there goes another occupation to the AI player pianos.
posted by credulous at 4:04 PM on April 15, 2022 [5 favorites]

If I train a model specifically on, idk, Thomas Kinkaid's entire ouvre, it doesn't really make sense that neither the model nor its outputs would be treated as derivative works.

But that doesn't seem to be how DALL-E is being treated- it's a magic copyright-stripping black box. In goes copyrighted works and out comes works that are arguably copyright whoever chose the prompt.
posted by BungaDunga at 4:06 PM on April 15, 2022 [6 favorites]

But which, in and of themselves, are not protectable until a human makes another thing out of them. So there's that thin reed. It's definitely starting to get into an outside context problem for the existing IP conceptual framework, with its basic assumptions about originality and reproducibility.

(Awesome FPP btw, thanks!)
posted by snuffleupagus at 4:09 PM on April 15, 2022 [3 favorites]

We will be in big trouble once an appreciable fraction of the internet is DALL-E and GPT generated. What will we train the next generation of models on? Eventually the entire internet will be AI generated and locked in to a permanent loop of AI outputs feeding AI inputs and the last piece of purely human content will be published in 2030. Everything else will be at least partly ghostwritten by an AI.
posted by BungaDunga at 4:10 PM on April 15, 2022 [19 favorites]

Large swaths of the art surrounding us is just going to suck for who knows how long. I'll tell my grandkids that all the artwork you see on packages and websites used to all be made by human beings, and it was so much better, but AI art is so much cheaper that nobody cares. It'll be just like the way I rant to my children now about how telephone calls used to sound as clear as talking to someone sitting right next to you. But digital audio packets are so much cheaper than dedicated lines.
posted by straight at 4:12 PM on April 15, 2022 [16 favorites]

But which, in and of themselves, are not protectable until a human makes another thing out of them.

I'm not sure. I think DALL-E outputs probably sometimes have the minimum amount of creativity required for copyright, as long as a human is still picking the prompt, and rejecting results and adjusting the prompt until they get an output they like.

But I'm not a copyright lawyer so don't take this as advice!
posted by BungaDunga at 4:15 PM on April 15, 2022 [1 favorite]

I've seen the copyright argument in relation to GPT-3, and it doesn't really hold water. None of the training data is stored inside the neural network, only the (unfathomably complex) web of patterns and relationships gleaned from them, a conceptual understanding of form and function that's encoded in billions of numerical weights and vectors in a complex multidimensional space. Even if that understanding is derived from copyrighted material, it's not really that different from how the human brain learns artistic techniques or literary devices from experiencing (copyrighted) art and literature over time. I suppose you could ask whether OpenAI had permission to use these works in their training process, but they state that the training data was all legally licensed.

You can certainly use DALL-E 2 to generate copyrighted stuff -- this Pikachu, for instance (though it's honestly not all that different from fanart). Regardless, you can already commit similar infringement using Photoshop or Microsoft Word, it just takes more work. This AI is fabulously creative and powerful, but it's still a tool that requires human input, and the humans who use it are accountable to the law.
posted by Rhaomi at 4:19 PM on April 15, 2022 [7 favorites]

This AI is fabulously creative and powerful, but it's still a tool that requires human input, and the humans who use it are accountable to the law.

I mean, yes, you could certainly argue that it creates works which would count as copyright infringement, but as multiple links in this thread show, there's still the grey area of the AI (or similar AIs) being trained using already copyrighted work and creating new and unique work which cannot currently be copyrighted or (possibly) claimed as infringement, if it is different enough.

Personally, I feel like the copyright arguments are a problem that will probably be solved one way or the other in the next few years.

The bigger issue for me is the way these AI generated images contribute to the dissemination of how we view art and artistic value. It's turning someone's artistic work (yes, people who create clip art/stock images are still artists) into mere grist for the mill of capitalistic interest. That was bad enough when it meant a room full of poorly paid apprentices or interns churning out reproductions, but now it's going on without any human input at all.. it doesn't feel great, philosophically.

I mean, I can imagine all this will do, ultimately, is replace the part of the industry between "I have an idea" and "get me someone who can make this idea happen". It'll be used for concept work, with a human coming in to add polish and to make sure the brand (or whoever) can copyright the final image. But even if it's "just" that, it will be real human artists who lose out once these AI programs become commonplace. Stock photographers, commercial artists, independent artists who will have to compete with AIs as they rise through the industry. It was bound to happen with the way things are going, but I dunno, I just can't be excited for something like this.
posted by fight or flight at 4:29 PM on April 15, 2022 [2 favorites]

Is there any way to play with the original DALL-E yet? I remember reading the similar proof of concept for that.
posted by dusty potato at 4:30 PM on April 15, 2022 [1 favorite]

I've been using GPT-J (a lighter-weight descendant of GPT-3) to co-author text for some tabletop RPG handouts; it's been really great! Basically I supply a couple sentences establishing the type of writing (usually old-timey newspaper articles) and a couple key facts, and let the machine improvise. Then I amend the output, and run it again to get a few more sentences at a time. It's definitely faster+easier than writing similar artifacts on my own, and the "collaborative" aspect is really cool. I've definitely ended up keeping a decent number of actual ideas which came purely from GPT-J.

I think this is broadly indicative of how these sorts of tools will be used; human in-the-loop, guiding the process and progressively editing, and therefore winding up with a copyright-able output. But the floor for making things is drastically lowered.

It's also worth thinking about how one would use these sorts of tools to make longer works. Essays, novels, graphic novels, ad campaigns, etc. Any of these will still be a lot of work, and need a lot of human input to guide the system to a good result. It's ultimately a tool, not general-purpose AI.
posted by kaibutsu at 4:35 PM on April 15, 2022 [12 favorites]

One thing I find interesting is how AIs like this don't really have a way to "know" how much leeway there is for noise across different topical elements. For example, this cake globe. The interior cake crumb could be all kinds of creative before I would start wondering what was up, but the globe exterior-- while impressively close to hyper-realistic-- has a few tells, like the weird appendage in the Caribbean, the exaggerated sassiness of Italy's leg, and the Wish version of Greece. Obviously the AI "understands" on some level from its training that world maps have a greater degree of structural similarity than chocolate cake crumbs do, but it doesn't seem entirely able to thread the needle of interjecting stylistic noise without straying outside the extremely strict formal parameters that it would need in order to be accurate for the specific task.
posted by dusty potato at 4:44 PM on April 15, 2022 [4 favorites]

I love that the kitty donut shop has a sign that says "DOUNT."
posted by mbrubeck at 4:57 PM on April 15, 2022 [4 favorites]

Nah, this will be the end of artists getting paid anything at all for what they do, even the small percentage of them who do now (e.g. me). Quality of the finished work will never, ever beat out something that's free and instantly available. The people who pay for commercial art don't actually care what it looks like, they just need it to be barely good enough for the intended use. And the majority of people consuming that art don't care either, so to the small extent that ai-generated stuff isn't quite good enough, the preferred solution will be to lower standards as necessary, rather than pay anybody to manage the process.
posted by Sing Or Swim at 5:25 PM on April 15, 2022 [6 favorites]

I've been playing a bit with Dream by WOMBO and have made an image with it that is absolutely going to be the cover of my next album. Probably more than one, honestly.

Most of its images are very obviously AI-generated (unlike DALL-E) but some of them are weird enough that it's hard to be sure, and some of them are visually stunning regardless.

(Normally I make my own album art from copyright-free images or entirely from scratch, a process that I enjoy. Just once (out of 46) I contracted my brother to paint me something. So I'm not really putting artists out of work with this.)

To me the scary thing is, it's probably not a huge leap from DALL-E to full-blown deepfake video.
posted by Foosnark at 5:31 PM on April 15, 2022 [2 favorites]

The Codex Seriphinianus (spelling) example doesn’t look like the actual illustrations at all. The text looks like garbage. I am not impressed, nor do I believe that Markov chaining a zillion pictures and then compositing them together to fit some brute force template of an image is of any value. The machine is a machine, unconscious, unaware, unknowing, and uncreative. But those here who fear this technology will usurp their livelihood have every reason to fear it, as it just becomes a cheap way to fill space, whether pictorial, musical, or textual. Cheap is the paradigm, not art.
posted by njohnson23 at 5:49 PM on April 15, 2022 [5 favorites]

My first reaction was "Thank goodness the actual Vermeer is so much better than the AI Vermeers!" It just crushes them completely in every respect. But yeah, people buying illustration probably won't care. And noticing the real Vermeer is better reminds of the time before computers were the best at chess, and I'd read interviews with grandmasters saying computers would never beat humans. And that turned out so well.

I don't know. Illustration was mostly destroyed by photography and advances in printing technology, and editorial photography was killed by stock photography and digital photography. But there are still professional illustrators, and still professional photographers. Still professional chess players, for that matter.
posted by surlyben at 6:04 PM on April 15, 2022 [1 favorite]

If I train a model specifically on, idk, Thomas Kinkaid's entire ouvre, it doesn't really make sense that neither the model nor its outputs would be treated as derivative works.

Is a human artist imitating another human artist's style treated as a derivative work?
posted by fings at 6:15 PM on April 15, 2022 [1 favorite]

Not unless the stylistic elements are themselves in some way protectable. That's more style in the sense of 'design' than 'technique' (or 'taste') though, and probably leads to a trademark and trade dress tangent re: the degree to which an 'aesthetic' can be protected.
posted by snuffleupagus at 6:20 PM on April 15, 2022

Foosnark: "To me the scary thing is, it's probably not a huge leap from DALL-E to full-blown deepfake video."

Video Diffusion Models (published one week ago!)

Edited to add: This New AI Makes DeepFake Videos
posted by Rhaomi at 6:30 PM on April 15, 2022 [1 favorite]

i don't intend to be fatuous but let's be real: the way this AI copyright stuff will work out is however the corporation or industry group willing to put the most money into lobbying and purchasing legislature wants it to. if you want to get to the heart of the matter ask Disney or Google or Amazon what they think is best.

better yet, train a 3-way adversarial consensus-seeking AI model using their input and ask it.
posted by glonous keming at 6:41 PM on April 15, 2022

I think this is broadly indicative of how these sorts of tools will be used; human in-the-loop, guiding the process and progressively editing, and therefore winding up with a copyright-able output.

I think they’ll be used as a generative propaganda gun purpose-built to undermine the whole idea that there’s such a thing as truth and that it should be the underpinning of a democratic society.

There’s just no world out there where GPT3 is anything but a weapon.
posted by mhoye at 6:50 PM on April 15, 2022 [7 favorites]

/u/Yuli-Ban, creator of /r/MediaSynthesis and author of the final link in the post, has a new post up from a few days ago laying out predictions for the next few years. Excerpt:
we're currently seeing a handover from GANs to transformers in terms of the premier generative methodology. GANs are something of a false start for the modern era, still useful but being replaced by the far more generalized transformer architecture. Transformers can do everything GANs can do, and more. In fact, multimodality is the new hotness in the field.

All of this is leading up to a state where machines are now beginning to show signs of imagination.

The most recent breakthrough in this field is undoubtedly DALL-E 2.
But it's far from alone. There's so much being done that I don't even know where to begin.

Perhaps Pathways is a good starting point. What can PaLM do? A better question is what can't it do. It's almost like GPT-3.5 in that it can synthesize text, answer questions, translate across languages, tell jokes, and more. And this despite being unimodal. GPT-2 was unimodal as well, and it could accomplish tasks like creating rudimentary images and MIDI music.

Imagine a variant of GPT that was trained in pure multimodality— text, image, video, audio, the works. The first iteration doesn't have to be terribly large like GPT-3. It just needs to be a proof of concept of what I like to call a "magic media machine."

I can 100% see this arising within the year. There's little reason why it shouldn't be possible in 2022 or 2023. Heck, I was sure it'd happen in 2020 and was surprised when it didn't.
  • Full-fledged HD video synthesis
  • AI-generated music will be earning creators thousands, perhaps even millions of dollars
  • AI-created video games will also become a bigger thing, especially in the indie market
  • Glimmers of full-generality. "This might be the most speculative statement yet, but I say that the path towards proto-AGI lies in multimodal imaginative systems."
Final takeaway: "advanced synthetic media is the digital version of molecular assemblers. Whatever can be represented in pixels or samples can be synthesized by AI, no matter what it is."

For reference, their last prediction post from 2017 foresaw "researchers working on/refining algorithms that can generate media, including visual art, music, speech, and written content", years before GPT, Jukebox, or DALL-E were public knowledge.
posted by Rhaomi at 6:50 PM on April 15, 2022 [4 favorites]

FYI I found an answer to my own question upthread:
posted by dusty potato at 6:53 PM on April 15, 2022 [4 favorites]

I am hopeful that this stuff will do about as well at replacing actual artists as speech-to-text has done with replacing manual transcription - somewhat usable for rough work, but still a lot of room to pay an actual human to do it. Like, I recently did a commission that did a good job pleasing the client with how well I was able to visualize their hazy ideas, in part because they spent some time refining input strings to Hypnogram to create some imagery they liked of some fairly abstract concepts. But my input and skill was still very useful in taking those bits and putting them together with a few other ideas into a coherent, complex whole.

Either that or we had better get some kind of basic income happening, because I sure as fuck do not want to have to try and find a new career at 50+ and hope that way of earning a living doesn't get eaten by some asshole trying to disrupt everything with software, too.
posted by egypturnash at 7:47 PM on April 15, 2022 [4 favorites]

I am hopeful that this stuff will do about as well at replacing actual artists as speech-to-text has done with replacing manual transcription

Hasanabi reacts: how to clone your streamer
posted by snuffleupagus at 7:52 PM on April 15, 2022 [1 favorite]

I have to commend the title "GAN Theft Auto".
posted by doctornemo at 8:23 PM on April 15, 2022 [2 favorites]

I'm stunned. I think I'm going to always remember where I was and what I was doing when this was posted.
posted by brachiopod at 8:42 PM on April 15, 2022 [3 favorites]

I'm stunned. I think I'm going to always remember where I was and what I was doing when this was posted.

I put it up there with a few very rare moments -- discovering the scope of Wikipedia, running Google Earth on a desktop for the first time (and a dozen years later in VR), the original iPhone keynote, and having a conversation with GPT-3 -- where it really felt like touching the future.
posted by Rhaomi at 9:12 PM on April 15, 2022 [1 favorite]

Gonna have to go be a farmer, learn a trade., something physical I can do with my hands.
posted by Going To Maine at 9:38 PM on April 15, 2022

Computers are bad now
posted by Going To Maine at 9:44 PM on April 15, 2022 [4 favorites]

I will confess that I’m curious to see what it comes up with for “Mr. Darcy angrily fists a robot.”
posted by Going To Maine at 9:55 PM on April 15, 2022 [1 favorite]

I will confess that I’m curious to see what it comes up with for “Mr. Darcy angrily fists a robot.”

QAnon Mk.II.

No, really.
posted by aramaic at 10:19 PM on April 15, 2022

Gonna have to go be a farmer, learn a trade., something physical I can do with my hands.

Definitely. First you'll need to start a subscription plan with John Deere for the software for your farm equipment, then it's time to look into licensing some seeds from Monsanto.
posted by whir at 10:46 PM on April 15, 2022 [19 favorites]

It's times like this that I'm glad my own creative endeavours over the last few years have mainly involved lazily entrenching an idiosyncratic drum kit style that's so horribly technically impoverished and such a disappointing and unsatisfactory experience for everybody but the player that if a machine learner ever does manage to capture it, all it will achieve is damage to its own reputation.
posted by flabdablet at 12:00 AM on April 16, 2022 [3 favorites]

> it really felt like touching the future.

DALL-E, the Metaverse, and Zero Marginal Content - "Machine-learning generated content has major implications on the Metaverse, because it brings the marginal cost of production to zero."
What is fascinating about DALL-E is that it points to a future where these three trends can be combined. DALL-E, at the end of the day, is ultimately a product of human-generated content, just like its GPT-3 cousin. The latter, of course, is about text, while DALL-E is about images. Notice, though, that progression from text to images; it follows that machine learning-generated video is next. This will likely take several years, of course; video is a much more difficult problem, and responsive 3D environments more difficult yet, but this is a path the industry has trod before:
  • Game developers pushed the limits on text, then images, then video, then 3D
  • Social media drives content creation costs to zero first on text, then images, then video
  • Machine learning models can now create text and images for zero marginal cost
In the very long run this points to a metaverse vision that is much less deterministic than your typical video game, yet much richer than what is generated on social media. Imagine environments that are not drawn by artists but rather created by AI: this not only increases the possibilities, but crucially, decreases the costs.

Zero Marginal Content

There is another way to think about DALL-E and GPT and similar machine learning models, and it goes back to my longstanding contention that the Internet is a transformational technology matched only by the printing press. What made the latter revolutionary was that it drastically reduced the marginal cost of consumption...

Machine learning generated content is just the next step beyond TikTok: instead of pulling content from anywhere on the network, GPT and DALL-E and other similar models generate new content from content, at zero marginal cost. This is how the economics of the metaverse will ultimately make sense: virtual worlds needs virtual content created at virtually zero cost, fully customizable to the individual.[1,2,3,4,5,6]
also btw...
Researchers Gain New Understanding From Simple AI - "Language processing programs are notoriously hard to interpret, but smaller versions can provide important insights into how they work."[7]
posted by kliuless at 1:01 AM on April 16, 2022 [4 favorites]

Edited to add: This New AI Makes DeepFake Videos

When they added teeth to Obama making a speech to make him look happier, it was so creepy…
posted by BlunderingArtist at 3:04 AM on April 16, 2022

I dunno, I guess I may be the contrarian here. I have faith that current and future artists will find ways, as they always have, to adapt and use new technology in ways that we can't imagine currently.

As someone who has taken up watercolor painting in recent years, I've already seen how AI could be useful as a way to shake up my ideas of composition, color choices, and subject matter.

I suspect AI will become a tool for a small subset of artists and utterly ignored by others. When photography and cameras became widely accessible, there were fears that painting would become obsolete, but for many artists today, digital photography is a useful tool.

As far as commercial artists are concerned, I'm less sure about their fates, but it does make me think of what happened to many web designers/developers 10 years ago whose bread and butter was small business websites. As templates became more sophisticated and "no-code" systems like Squarespace, Wix, etc. began to emerge, some designer/devs did move on to other disciplines, but most adapted and thrived, getting deeper into custom code, or evolving into UX/Content Strategists.

It's not a perfect analogy as websites need maintaining and updating whereas once commercial artwork is done, it's done. But I suspect that the businesses that currently pay well, will continue to do so for humans who can respond to feedback and create custom artwork.

Graphic design, the field I am most familiar with, has grappled with this for years. The cheapest logo design these days is essentially a form of AI (although the results are still made by humans, kinda), Fiverr has its own "Logo Maker" that requires nothing more than typing in a few words and dragging a few sliders to return dozens of results for your new redesign, for example, your new community weblog.
posted by jeremias at 4:11 AM on April 16, 2022 [5 favorites]

>The bigger issue for me is the way these AI generated images contribute to the dissemination of how we view art and artistic value. It's turning someone's artistic work (yes, people who create clip art/stock images are still artists) into mere grist for the mill of capitalistic interest.

Patronage and who writes the history books hasn't changed with this endeavour. There's so much stuff made throwaway, and so many people "can't not do" creative acts that maybe the volumes of actionable art -- stuff that meets a client brief -- made by trained machines will drown out human-made stuff for commissions. This, though, is an age-old problem of what to promote and which deals to broker, so I don't agree it's anything more than another source of art and artistic value. People have given away memes and let loose dissemination of work so that it can become part of the cultural record and zeitgeist. If this output from DALL-E was qualitatively different we would dismiss its utility until it found a niche (and price point) of its own.
posted by k3ninho at 6:38 AM on April 16, 2022

This means that now even the most bizarre thoughts coming from the fever dream that is Tumblr can be visualized in an instant. Shitposting at the speed of thought. Your scientists were too occupied with whether or not they could, etc.
posted by Hardcore Poser at 11:24 AM on April 16, 2022 [1 favorite]

I've been experimenting with writing using Sudowrite, and while I find it fascinating, the people I've shown it to seem to fall into either "Wow! That's cool! I wanna try it!" or "Huh. I don't like it/can't use it/don't get it." A LOT of them assume it's fake, or a scam, or mechanical Turk somehow, others will point out "yeah but you can still tell it's fake because--"

I find it absolutely awe-some, in the original sense, that we're finding ways to use math to tap into our sort of collective human experience. It's a very 'down to the very bones of what we think is real' kind of exploration of probabilty and chaos. I've paged through tons of AI art that is "hmm" or "ooh" but every so often I'll find one that hits ME, personally, right in the feels for some reason, just the way art should. I've thrown out many lines of text that are fine but not what I want, but sometimes I'll generate a line that is better than I'd have come up with, but still sounds like what I would have said, in a better universe. That's an amazing feeling.

It's a little worrisome mainly because the source of all the information is still humanity; Sudowrite, for example, can generate tons of erotica and porn and does it very well (keeping track of who's doing what to whom and where, even) because humans love writing about sex, and there's a lot of things humans enjoy which are problematical.

I admit that as a frustrated artist, I've loved being able to use the art programs to finally generate pictures of characters or landscapes, accomplishing by simple repetition ad infinitum to get closer to what I want than working with an artist who, honestly, has a life and is only willing to do so many revisions for my commission fee. But if I can get closer to what I want by spending hours and hours on AI art, then I can take the result to a real artist and go "Basically like this, but can you make his smile more sincere?" I still prefer to hire real artists eventually, but if I can rough out the idea beforehand, for free, that helps us both.
posted by The otter lady at 11:26 AM on April 16, 2022 [4 favorites]

In the post, I had to shorten a lot of the captions to squeeze as many in as I could; I've also since realized that a few of them link to smaller copies instead of the original 1024x1024s. So for completeness, here are all the text prompts (with sources) for the images linked above the fold, with larger versions underlined, all rehosted on Imgur in case any of them are ever deleted:
"A raccoon astronaut with the cosmos reflecting on the glass of his helmet dreaming of the stars" [source]
"A photo of a sloth dressed as a Jedi. The sloth is wearing a brown cloak and a hoodie. The sloth is holding a green lightsaber. The sloth is inside a forest" [source]
"Mona Lisa in a Lego landscape" [source]
"Cute woodland animals on rainbow grass, digital art" [source]
"vintage photo of a fat bulldog dressed in overcoat and hat, driving an old car" [source]
"A rabbit detective sitting on a park bench and reading a newspaper in a victorian setting" [source]
"A 35mm macro shot of a kitten wearing glasses, extremely detailed" [source]
"cool panda riding a skateboard in Santa Monica" [source]
"a propaganda poster depicting a cat dressed as french emperor napoleon holding a piece of cheese" [source, originally from page 2 of the DALL-E 2 paper]
"plants inside a lightbulb" [source]
"A photo of proud raccoon artists posing next to their paintings" [source]
"Lost train station in the Anime style by Ghibli Studio." [source]
"A kid and a dog staring at the stars" [source]
"Teddy bears shopping for groceries in the style of ukiyo-e" [source]
"A bowl of soup that looks like a monster knitted out of wool" [source, originally from the DALL-E 2 demo]
"An astronaut riding a horse as a pencil drawing" [source is the DALL-E 2 demo]
"the painting American Gothic, with two dogs holding pepperoni pizza instead of the farmers holding a pitchfork" [source]
"A badass sheep wearing a lab coat in a science lab, 1980s Miami vibe, digital art" [source]
"Mona Lisa in the style of Twin Peaks." [source]
"A donut shop run by a kitty" [source]
"A photo of a colorful games room in memphis design, artstation" [source]
"A stunning photograph of a Pikachu wearing a cape, 8K HD, incredibly detailed" [source]
"An Art Deco cat whittled out of wood, looking at a red laser." [source]
"fruit golem" [source]
"Codex Seraphinianus" [source]
"Voynich Manuscript" [source]
Variation on Girl With a Pearl Earring by Johannes Vermeer [source is the DALL-E 2 demo]
Variation on The Kiss by Gustav Klimt [source is the DALL-E 2 demo]
Variation on A Sunday Afternoon on the Island of La Grande Jatte by Georges Seurat [source is the DALL-E 2 demo]
Variation on prints by Ohara Koson [source is the DALL-E 2 demo]
"Good morning, written at a post-it in Cupola of ISS with a big view on Earth." [source]
"A cat in a blue hat" [source]
"Writer thinks out the main plot of her book, oil painting, in style of Spitzweg." [source]
"The thing which misses the time, oil painting, in the style of De Chirico." [source]
"Cheshire Cat playing poker with Tinkerbell, digital art" [source]
"the incredible umber hulk painted by pieter bruegel" [source]
"a single plum floating in perfume served in a man's hat" [source]
"Planet earth cut in half. The inside of the planet looks like a chocolate cake" [source]
"Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna." [source]
"A robot painting on a canvas while playing the piano" [source]
"A photo of a robot hand drawing, digital art" [source]
"a robot hand painting a self portrait on a canvas" [source]
"A robot showing another robot its painting" [source]
posted by Rhaomi at 11:39 AM on April 16, 2022 [9 favorites]

The Writing On The Wall - "Are Large Language Models like GPT-3 showing signs of emergent intelligence? And can we train them to become good citizens?"

A.I. Is Mastering Language. Should We Trust What It Says? - "OpenAI’s GPT-3 and other neural nets can now write original prose with mind-boggling fluency — a development that could have profound implications for the future."
posted by kliuless at 1:12 PM on April 16, 2022 [1 favorite]

> None of the training data is stored inside the neural network, only the [...]

Just because it's not recorded in a form you recognize, doesn't mean it's not there in a real sense. You wouldn't see the image in a jpeg either, if you didn't know how to decode it. If a magician pulls a rabbit out of a hat, the rabbit was in the hat, and at some point the magician put it in there.
posted by Horselover Fat at 3:49 PM on April 16, 2022 [5 favorites]

Well in GPT-3's case at least, the model has 175 billion parameters and requires about 350 GB of memory, but was trained on a 45 terabyte text corpus. I don't know the stats for DALL-E 2 offhand (or whether they've even been released), but I seriously doubt it's large enough to contain the billions of images it was trained on. Even with great compression there's simply no way to store all that training data within it, only the overarching patterns. It's also telling that it's nigh-impossible to get it to reproduce existing works, even with leading prompts -- with GPT-3, the best you can get are brief snatches of names or phrases along with stylistically similar but unique content.
posted by Rhaomi at 4:09 PM on April 16, 2022 [2 favorites]

I'll pipe in with...

A Short History of Neural Networks Memorizing Things.

Once upon a time there was a Really Interesting Result: If you randomly assign labels to ImageNet images, a neural network can learn the full set of training labels with perfect accuracy. (obvs, the trained model doesn't generalize at all and completely fails on the 'test' set.) This was widely reported as 'memorizing the dataset.'

It turns out that this kind of memorization for classification is MUCH easier than actually memorizing all of the pixels. Consider this solution: Take a hash of each photo (a way to turn each unique image into a number) and store the label that goes with it. This is a look-up table for the answers. It solves the problem, but doesn't actually require storing all of the pixels. A neural network can do something similar using geometry instead of hashing, but the principle is the same. Using this kind of approach, memorization can be extremely efficient... For classification!

For language models, we've also seen evidence of memorization, though it's less clear to me what the exact mechanism is. The principle is the same, though: For rare events, minimizing the loss requires memorization instead of learning generalizable features.

Recently there's really cool (imo) work augmenting language models with search, as in DeepMind's RETRO. The game here is to separate the memorization functions into a real database, to allow the learned network to 'focus' on learning more generalizable features.
posted by kaibutsu at 4:53 PM on April 16, 2022 [1 favorite]

Do we know for sure that this is real? Seriously, some of these look so much like they were painted by human beings, I have to wonder if this will turn out to be a prank-ish art project masquerading as a tech demo. Or a scam. Assuming this is legit, I think it's amazing and horrifying.

Artists were forced to adapt when photography was invented, since there was really no point in striving to achieve perfect realism when a photo could do it better and faster. This new tech will require artists to adapt in even more substantial ways. Illustration (children's books, gag cartoons, mascots on cereal boxes, etc.) is probably going away, at least as a way for anybody to earn a living. I hate that this is so, but I don't think there's any denying it. When you can just have a computer poot out a professional-quality illustration or passable stock photo in seconds, there's just no need to employ a human for the job. I think comics are safe, for a while anyhow, because we're not at a point where you can feed a computer the script for Watchmen and it will be able to create an entire, coherent work with compelling visual storytelling. But 6-panel webcomics would probably be a cinch, and longer stuff may be only a matter of time.

A lot of revolutionary art was more about the idea than the execution. Duchamp's Fountain was just a freaking urinal with a signature scrawled on it, but the crucial thing was that Duchamp was declaring that it was art at all. Warhol's soup cans aren't really that thrilling to look at, a lot of his art wasn't, but he's still one of my favorite artists because his ideas were so fun. Machines can come up with wacky, random ideas but I don't know if they'll ever be able to innovate the way that human artists can. So I think artists will have to re-focus, doing stuff that machines aren't good at, or aren't good at yet.

We'll probably see more conceptual stuff, more environments and sculpture. Stuff that physically exists in the world, where the artist's fingerprints are part of the appeal. Artists will also incorporate this tech into their work and there may be some fascinating results. An artist could generate hundreds of images in different styles, stuff they could never draw themselves, and use those images to tell their own story. Or they could do meta stuff pushing this software to the breaking point, generating images based on prompts that confuse the computer in fun ways.

Crappy local advertising will probably look a lot more professional as nephew art is replaced by robot art. A lot of ugly stuff will go away but on balance I don't think think this will be good for art. Machines should be a tool for art, not a replacement for artists. The androids weren't supposed to dream of the electric sheep for us!
posted by Ursula Hitler at 4:57 PM on April 16, 2022 [5 favorites]

I thought about this from a writers perspective; if a computer can generate readable stories, is there any hope for writers? And I think there truly is; yeah if I just want to read a quick sex scene to get off, the AI can do that and that's fine, but if I really want to read a good -story-, I'm going to one of my favorite authors.

I don't think the AI will be able to compose real art.... Until it becomes self-aware.

Which it will.

And I for one welcome...
posted by The otter lady at 7:36 PM on April 16, 2022 [4 favorites]

I did not expect that our AI overlords would arrive to design my AI underpants
posted by snuffleupagus at 8:25 PM on April 16, 2022 [2 favorites]

It'll be just like the way I rant to my children now about how telephone calls used to sound as clear as talking to someone sitting right next to you.

You might want to get your hearing checked below 300Hz and above 4KHz.
posted by atoxyl at 2:32 AM on April 17, 2022 [4 favorites]

If you aren't taking calls in FLAC or on the turntable, why bother conversing at all?
posted by snuffleupagus at 5:40 AM on April 17, 2022 [4 favorites]

A reddit with collected examples of DALL-E 2 art. The images are all amazing. AI art has always had trouble with simple things like rendering spheres, other geometric shapes, humans with the correct number of limbs, and DALL-E 2 seems to have solved that? I would really like to see some DALL-E 2 fails to know what its limits are.
posted by jabah at 10:41 AM on April 18, 2022

New techniques:
Using DALL-E's interpolation feature to finally show how to draw the rest of the fucking owl.

You can stitch together multiple overlapping frames to create continuous panoramas much larger than the default 1024x1024, for instance: Salvador Dali, flowers, Heironymous Bosch.

You can also shrink an existing image, and then use the "inpainting" feature to fill in the rest to make an "uncropped" version: Mona Lisa, The Scream
An interesting reaction thread from artist Danielle Baskin [unrolled]:
The @OpenAI team to me: "We'd like to share our new visual media tool with a few artists to gather feedback"

They should have said: "We'd like to bestow upon you the gift of glimpsing into infinite human consciousness to invoke parallel worlds. Ready to wield this power?"
Some waitlist news from @SamAltman: "we are adding several hundred users per week now, and will ramp in up significantly in a few weeks."

In the meantime: moar images!

"digital art of a tyrannosaurus rex flying a fighter jet, wearing aviator goggles, blue sky, sun at zenith, contrails" [source]
"house with the design of a strawberry, realistic 4k art" [source]
"A fluffy baby sloth with a knitted hat trying to figure out a laptop, close up, highly detailed, studio lighting, screen reflecting in its eyes" [source]
"A view of the gateway to the Great Temple at Baalbec, by Paolo Veronese" [source]
"3D render of a pink balloon dog in a violet room" [source]
"Steampunk downtown skyline with flying airships next to a 300 meter waterfall, digital art." [source]
"group of chipmunks singing karaoke under the disco ball" [source]
"Intense jaguar in a dark misty jungle, irregular and painterly rectangular regions of color, surrealism and expressionism, painted by Mark Rothko" [source]
"This is how I feel about Mondays, digital art" [source]
"A family crest for the Mather family, which has doctors, nurses, and gardening enthusiasts" [source]
"A dollhouse replica of a building being reclaimed by nature, left inside that same building" [source]
"A 1960s yearbook photo with animals dressed as humans" [source]
"A detailed sculpture made from butter of Marie Curie isolating radioactive radium salts" [source]
"a baby fennec sneezing onto a strawberry, detailed, macro, studio light, droplets, backlit ears" [source]
posted by Rhaomi at 6:37 PM on April 21, 2022 [6 favorites]

So many things about this are blowing my mind but there's something about the composition/framing of things feels very mature and well matched to the desired narrative of the pictures. It has such a great sense of how much of an object or character or location needs to be visible for it to read well. The ‘appropriateness’ of it's choices is spooky.
posted by brachiopod at 7:04 PM on April 21, 2022 [2 favorites]

"This is how I feel about Mondays, digital art"

Now tell me why you don't like Mondays...

"The silicon chip inside her head
Gets switched to overload..."
posted by kaibutsu at 7:11 PM on April 21, 2022 [1 favorite]

Another round-up!

"Oil painting of a sad girl looking out the window of a school bus" [source]
"avocados dancing, drinking, singing and partying at a Hawaiian luau" [source]
"Photo of an athlete cat explaining it's latest scandal at a press conference to journalists" [source]
"men's fashion pinterest board" [source]
"brutalist dining room furniture made of concrete" [source]
"A cat with angel wings." [source]
"cyberpunk old west town with two cyborgs facing each other in main dirt street, digital art" [source]
"A photograph of a pothole with a frozen puddle in a city street after a winter storm, reflecting the streetlights at dusk, with a single leaf floating on top of the frosty puddle." [source]
"x-ray scan of a rock singer" [source]
"Triangle with four corners" [source]
"Thousands of pastel colored paper planes washed up on the shore of a beach at sunrise" [source]
"Pulp Fiction played by the Muppets" [source]
"courtroom sketch of a cat looking nervous in the style of Walt Stewart" [source]
"A photograph of a bad taxidermy squirrel with piranha teeth, in poor condition sitting next to old books on a dusty shelf, macro lens, detailed" [source]
"Tile art, tessellation of Pikachu" [source]
From the same source, an actual tesselation!: "Tessellation of Hello Kitty" [source]
"A photograph of a highlander cow in a snowy field, ice crystals in fur" [source]
"photo of my cat getting frustrated trying to explain vc financing to me in our dining room with a whiteboard but I just don't get it" [source]
"The Scream by Munch as children's coloring book" [source]
"A detailed photograph of a piglet wearing a small fedora, studio lighting" [source]
"A photograph of a side profile of a human eye showing the reflection of a window, macro lens" [source]

Results for /r/DALLE2's inaugural "Imitation Game", where users with access try to replicate a real photo.

From Tumblr, some informed speculation about the inner workings of the model and OpenAI's development process.
posted by Rhaomi at 9:24 PM on April 26, 2022 [2 favorites]

I've been following the subReddit on this and it continues to blow my mind. There's no further need for humans to have any involvement with the creation of album cover art.

Olive oil and vinegar drizzled on a plate in the shape of the solar system.
posted by brachiopod at 8:21 AM on April 27, 2022 [1 favorite]

braciopod, I think you meant to link here?

Also, I'll try to stick to updating here once every few days till the thread closes, but I couldn't wait to post this one because the prompt is just brilliant:

"An IT-guy trying to fix hardware of a PC tower is being tangled by the PC cables like Laokoon. Marble, copy after Hellenistic original from ca. 200 BC. Found in the Baths of Trajan, 1506." [source]

If you're interested in DALL-E (and not swearing off Twitter), give Merzmensch a follow because his prompts are consistently high-caliber and interesting.
posted by Rhaomi at 12:48 PM on April 27, 2022 [4 favorites]

Missed the edit window, but this Towards Data Science article by Merzmensch on his experience with DALL-E versions 1 and 2 is illuminating: DALL·E: an AI Treasure Chest in Action

Among other things, he notes that the terms explicitly ban NFTs and give the copyright on all images to OpenAI for now (though they do encourage personal, non-commercial use).
posted by Rhaomi at 1:07 PM on April 27, 2022

Thanks Raomi. Yes, that Laokoon image is astounding. I guess it's because there's so much else going on in the world that this isn’t getting the attention it deserves. To me, it's on par with man setting foot on the moon.
posted by brachiopod at 7:22 PM on April 27, 2022 [1 favorite]

"1111101000 Robots". 1000 illustrations of robots generated by DALL-E 2, presented as a book.
posted by BungaDunga at 4:14 PM on April 28, 2022 [3 favorites]

(solarpunk is a funny choice of style because it all seems to be derived directly from a single image, much as all vaporwave is a variation on the national anthem of vaporwave.)
posted by kaibutsu at 3:48 PM on May 1, 2022

Phil Wang (a.k.a. “lucidrains”) has started an open-source implementation of the DALL-E 2 algorithm using pytorch. This will make it possible for anyone with a powerful GPU and a corpus of training data to train and run their own models. (The original implementation by OpenAI is proprietary and only accessible to a limited number of people.)

Wang has previously implemented other generative models like StyleGan2 in pytorch, which he used to create This Person Does Not Exist.
posted by mbrubeck at 8:22 PM on May 1, 2022 [3 favorites]

From LessWrong: What DALL-E 2 can and cannot do

DALL-E's strengths:
  • Stock photography content
  • Pop culture and media
  • Art style transfer
  • Creative digital art
  • The future of commercials
DALLE's weaknesses:
  • Scenes with two characters
  • Foreground and background
  • Novel objects, or nonstandard usages
  • Spelling
  • Realistic human faces
  • Limitations of the "edit" functionality
posted by Rhaomi at 9:53 PM on May 1, 2022

Paper: A very preliminary analysis of DALL-E 2 [PDF]
The DALL-E 2 system generates original synthetic images corresponding to an input text as caption. We report here on the outcome of fourteen tests of this system designed to assess its common sense, reasoning and ability to understand complex texts. All of our prompts were intentionally much more challenging than the typical ones that have been showcased in recent weeks. Nevertheless, for 5 out of the 14 prompts, at least one of the ten images fully satisfied our requests. On the other hand, on no prompt did all of the ten images satisfy our requests.

The examples were specifically designed to probe what we conjectured might be weaknesses in DALL-E 2.
posted by Rhaomi at 11:42 PM on May 1, 2022

@Crypto_Merz #NFT art stream by @Merzmensch

I've switched from 'whimsical genius' to 'fuck this guy'

Among other things, he notes that the terms explicitly ban NFTs and give the copyright on all images to OpenAI for now (though they do encourage personal, non-commercial use).

Well, I guess start suing? This isn't compatible with his own actions....unless the stuff he's selling isn't OpenAI. But still.

Or, release the code if this is what 'elite' people are going to do with their access. I'm sure the crowd principle could produce equally interesting prompts.
posted by snuffleupagus at 9:04 AM on May 2, 2022

I was disappointed to see that and told him as much, but apparently he's using a niche eco-friendly blockchain and doesn't actually use DALL-E 2 imagery for it (more of a cross-promotional thing for his other art). Still propping up the web3 scam (and even suffered a wallet theft recently, hint hint) but at least he's mostly doing it for the art.

On that note, some more recent faves:

"watercolor painting of an off-white cowboy hat on a wooden table in front of a window" [source]
"photo of a koala swimming through thousands of tomatoes" [source]
"heart made of water" [source]
"Supervillain drawn in the style of Akira Toriyama using colored pencils" [source]
"An airbrush caricature of an old man" [source]
"Screenshot from 2020 Star trek the next generation reboot" [source]
"lion playing violin with blue moon in the background" [source; also available with an N95 face mask!]
"Stranger, by Caspar David Friedrich" [source]
"An orange cat staring at a drawer filled with socks on fire, high-resolution photo" [source; apparently inspired by this weird shitpost]
"A bunny eating a bagel!" [source]
"mosaic of bad ancient roman cat being chased by angry ancient roman woman" [source]
"a painting by Grant Wood of an astronaut couple, american gothic style" [source]
"a toulouse lautrec painting of a giraffe eating a crepe with the tour eiffel in the background" [source]
"A pre raphaelite painting of a person waiting for their iPhone to power on after plugging it back in" [source]
"A photograph of an apple that is a disco ball, 85 mm lens, studio lighting" [source]
"a kid dressed up as a Storm Trooper celebrating May the Fourth by performing a TikTok dance, unreal engine" [source]
"Mozart drinking a smoothy" [source]
"A pondering philosophical grizzly bear, digital art" [source]
posted by Rhaomi at 11:32 PM on May 5, 2022 [1 favorite]

DALL-E 2 Inpainting / Editing Demo, including limitations and workarounds

2.7 hour DALL-E 2 demonstration video (Spanish language, but English subtitles are available)

One user has discovered that the model understands emojis, leading to this fun request thread

More favorites:
"A photograph of a young jackrabbit grazing in a vegetable garden, early morning light, 85 mm lens, 70 mm entrance pupil diameter" [source]
"A tree with apples, like a drawing of the sand on a beach" [source]
"norman rockwell painting" [source]
"Ada Lovelace and Marie Curie having a coffee at the Eiffel Tower, digital art" [source]
"Knolling of a scientists' bag." [source]
"Mother, oil painting, by Magritte" [source]
"A fractal painting of DALL-E 2, mandelbrot style" [source]
"A beautiful render of Padme Amidala portrait with her face illuminated by a blue lightsaber, lucas films, orange and blue contrast, trending artstation, fantasy art" [source]
"cyberpunk nun in cyberpunk church" [source]
"A duck with David Bowie makeup, portrait, dark bg" [source]
"a cybertronic bison, leds, high detail, sharp, studio, digital art" [source]
"A photograph of a saguaro cactus wearing a sun hat and aviator sunglasses in a sunny desert, 35 mm film" [source]
"an anthropomorphic fox wearing a leather jacket, fursona, digital art" [source]
"last supper in the style of keith haring" [source]
HD collage of "happy racoon wearing a [color] turtleneck, studio, portrait, facing camera, studio, dark bg" [source]

Will try shooting for one more update before the thread closes!
posted by Rhaomi at 7:38 PM on May 11, 2022 [1 favorite]

We asked 100 humans to draw the DALL·E prompts. Which one of these was generated by a human, and which by an AI?

Man Gets World’s First AI-Designed Tattoo

Last call for DALL-E 2 faves!

"Researchers and scientists in a lab mixing a boiling cauldron of data sets and models over an open flame, digital art" [source]
"jacob lawrence painting of san francisco" [source]
"fruit fractals" [source]
"A stern-looking owl dressed as a librarian, digital art" [source]
HD collage of "happy horse wearing a [color] hoodie, studio, portrait, facing the camera, detailed" [source]
"banana shaped like a pineapple" [source]
"mutual curiousity" [source; probably not that exact prompt or how they got it so hi-res but the image is cool]
"a monkey head that is only made out of fruit, 3D" [source]
"Stanislaw Lem's the Futurological Congress as depicted by Hieronymus Bosch" [source]
HD collage of "square [fruit]" [source]
"A steampunk Furby, 3D render, studio lighting" [source]
"A photograph of a street sign that warns drivers dragons might be up ahead" [source; it even nailed the text!]
"Portrait of a cat, photographed with a 200mm f2 prime lens, background bokeh, dslr" [source]
"a raccoon learning to crotchet with neon yarn, pixel art" [source; thank Instagram for the low res]
"a beautiful forest with ruins of archways, watercolour by James Gurney" [many more images at the source]
"prehistoric meteorites look up in horror as a giant falling dinosaur enters their atmosphere" [source]
"The greek goddess Themis holding her scales leads the people from a dark dystopian Cyberpunk world to a bright utopian Solarpunk world" [source]
"panda monk praying at dawn in a temple in the pandaverse, digital art" [source]
"A robot finding his place in the universe under a cherry blossom tree in full bloom, digital art" [source]

(Frustratingly, I narrowly missed a chance to try for an early invite myself; their Instagram account joined several Instagram artist livestreams today and passed out links to a Google Meet session where you could audition somehow, which was great since the waitlist application only had fields for Twitter/IG/LinkedIn. But do keep an eye on their Instagram activity if you're hoping to get in!)

Last but certainly not least: just a couple days ago OpenAI competitor DeepMind revealed project Gato, what it terms "A Generalist Agent":
Inspired by progress in large-scale language modelling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.
In short, a single AI model that has near-human performance on hundreds of disparate tasks, including "play Atari, caption images, chat, stack blocks with a real robot arm and much more." Intriguingly, the model was relatively underpowered compared to the likes of GPT-3 and DALL-E 2, with "only" 1.2 billion parameters, apparently in order to limit latency for the physical-robot tests. I'm no expert (this post notwithstanding), but I do keep tabs on folks who are -- and I'm seeing more and more excitable chatter that this breakthrough in generalist AI (which can potentially be scaled upward indefinitely) puts us tantalizingly close to true artificial general intelligence within the next few years:

Daniel Kokotajlo: Deepmind's Gato: Generalist Agent (the comments are also worth a read)

TechCrunch: DeepMind’s new AI can perform over 600 tasks, from playing games to controlling robots

Discussion on /r/futurology and /r/singularity

Timelines of AI model development: 1947-2020 vs. 2021 (and that's not even including recent overlapping breakthroughs like DALL-E 2, PaLM, Flamingo, Chinchilla, Gato, etc.)

Metaculus maintains a prediction market for the public advent of "weakly general AI"; it spent the last few years mired in the 2030s-40s, but after the announcement of DALL-E 2 and especially the explosion of new models in the last month, the average prediction has dropped to March 24th, 2028.
posted by Rhaomi at 11:21 AM on May 15, 2022 [2 favorites]

« Older Blue Sky Studios Gives Scrat a Proper Sendoff in...   |   Wherefore art though balcony? Newer »

This thread has been archived and is closed to new comments