OpenAI brings you: Confused Capybara Emoji
January 6, 2021 10:45 AM   Subscribe

DALL·E: Creating Images from Text DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs. We’ve found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images. (Note, the examples are interactive) posted by CrystalDave (19 comments total) 29 users marked this as a favorite
 
I wrote this up for work and apart from adding a little context didn't feel like I could explain the concepts better than the OpenAI post. The examples, of which there are MANY, really show the versatility of the system. Go play with them!

Essentially the way GPT-3 is able to repeatedly interpret a prompt and output a "plausibly human-written" series of words, DALL-E can repeatedly interpret a prompt and output a "plausibly real" series of images. The limitations on this process are extremely few!

Text to image has been done many times before, but DALL-E is way more robust and flexible, and understands natural language and logic much better than previous systems. It's a big advance but still not very practical... people saying RIP stock photographers haven't looked closely at the images DALL-E puts out, they are still chock full of AI weirdness (and right now very small too). That's a harder problem to solve than people realize. Reality is highly consistent! AI, not so much.
posted by BlackLeotardFront at 10:52 AM on January 6, 2021 [3 favorites]


My favorite weird part of this:

We also find that inserting the phrase “professional high quality” before “illustration” and “emoji” sometimes improves the quality and consistency of the results.

I'm very excited about being able to improve software performance by just specifying that I want it to do a good job this time.
posted by theodolite at 11:04 AM on January 6, 2021 [33 favorites]


For a puzzle game I worked on once we ended up using Fiverr to get illustrations of stuff like a cross between a lion and a butterfly. We sent (and paid for) the same prompt to a few different artists, and then picked one with a style we liked, and then had them do another 20 extremely weird drawings. If DALL-E can be convinced to output a set of things in a consistent style, this could revolutionize making weird pictures of things that don't even really make sense!

Also how long until this sort of system can make not just plausible images but actual manufacturable designs? I realize that's much more complicated, and we're probably talking about 5-20 years, not 6 months, but ... dang, imagine being able to just say "a tea kettle that looks like a dagwood sandwich" and it spits out two dozen designs, and you pick one and in two week it shows up at your door, and it works.
posted by aubilenon at 11:24 AM on January 6, 2021 [4 favorites]


Metafilter: weird pictures of things that don't even really make sense
posted by genpfault at 11:27 AM on January 6, 2021 [1 favorite]


This is amazing, and I hate and fear the future.
posted by Going To Maine at 11:34 AM on January 6, 2021 [4 favorites]


Isn't Confused Capybara an Ubuntu edition?
posted by MrGuilt at 2:50 PM on January 6, 2021 [17 favorites]


The picture of the baby daikon radish could plausibly have come from a kids' book. I bet it will be months, not years, before someone uses GPT-3 together with DALL-E to generate unlimited numbers of kids' books on Amazon, each one triggered by different combinations of key words.
posted by Joe in Australia at 3:16 PM on January 6, 2021 [3 favorites]


Or that Amazon itself releases a children's book on demand feature for Kindle.

I wish the prompt were open-ended. I really want to see what DALL-E thinks "colorless green ideas sleeping furiously" look like.
posted by jedicus at 3:25 PM on January 6, 2021 [3 favorites]


> I'm very excited about being able to improve software performance by just specifying that I want it to do a good job this time.

You should try "I would like to see you do professional work this time" on human graphic designers too.
posted by ardgedee at 3:38 PM on January 6, 2021 [5 favorites]


I find their lack of vallhunds disturbing.
posted by GCU Sweet and Full of Grace at 3:55 PM on January 6, 2021


One of the weirdest things is the AI's attempt to caption the giraffe/turtle hybrids. Some of the attempts look like it was working towards "Giratle". Emergent behaviour, y'all.
posted by Joe in Australia at 4:05 PM on January 6, 2021


We also find that inserting the phrase “professional high quality” before “illustration” and “emoji” sometimes improves the quality and consistency of the results.

I'm very excited about being able to improve software performance by just specifying that I want it to do a good job this time.


Can't wait to yell "enhance!!" at blurry photos and have it work.
posted by Emily's Fist at 4:47 PM on January 6, 2021 [5 favorites]


I wish the prompt were open-ended.

I don't think they're keen to pay for that much compute.
posted by little onion at 5:11 PM on January 6, 2021 [1 favorite]


With that emoji generation, and decorating the teapot in various ways... how long before you can give it a reference sheet of your fursona and get a set of Telegram stickers out?
posted by NMcCoy at 6:23 PM on January 6, 2021


> I really want to see what DALL-E thinks "colorless green ideas sleeping furiously" look like.
'Professional high-quality colorless green ideas sleeping furiously', too, please.
posted by k3ninho at 12:23 AM on January 7, 2021 [3 favorites]


I think the biggest part of this for me is that it's demonstrating the sorts of advances that are a matter of degree, of time and computing power, rather than any sort of unsolved problem. I did a google image search for "avocado chair" and I'm pretty sure the AI isn't actually overfitting/plagiarizing any actual images, but some of those avocado chairs look astonishingly plausible and intentionally designed, even clever. Even at its current level of power this could be an incredible tool for exploratory design.
posted by NMcCoy at 12:58 AM on January 7, 2021 [3 favorites]


How do I generate the same images but with "el chupacabra" instead of "capybara" please respond quickly as I need this for work.
posted by rum-soaked space hobo at 6:14 AM on January 7, 2021 [1 favorite]


I am giggling at just looking at the combinations the article is providing, and the parameters by which I can change them. A stack of bats is funny. A tetrahedron of coffee beans is funny. an orange triangular manhole—larrrffsss
posted by not_on_display at 10:00 PM on January 8, 2021


Can they beat the original?
posted by storybored at 10:16 AM on January 10, 2021


« Older It's wintry. It's pandemicky. Why not enjoy a warm...   |   Mob breaches Capitol, spurring lockdown Newer »


This thread has been archived and is closed to new comments