I’ve seen things you people wouldn’t believe
May 25, 2016 9:25 AM Subscribe

Autoencoding Blade Runner — Artist and researcher Terence Broad shows off his results “getting artificial neural networks to reconstruct films — by training them to reconstruct individual frames from films, and then getting them to reconstruct every frame in a given film and resequencing it.”

(bonus: he did the same thing with A Scanner Darkly.)

posted by neckro23 (32 comments total) 22 users marked this as a favorite

Maybe this is what the world will look like while I'm slowing dying from something that is eating my mind.
posted by King Sky Prawn at 9:32 AM on May 25, 2016 [5 favorites]

Compare this to:

"Movie reconstruction from human brain activity "
posted by I-baLL at 9:36 AM on May 25, 2016 [1 favorite]

It looks like the big difference between the neural network clip and the human brain activity clip is that the human brain activity clip looks like it was run through an image recognition algorithm that then applied the results to Google's deepdream thingie and then overlayed that on the visually processed image.
posted by I-baLL at 9:37 AM on May 25, 2016

The film lays on the cutting room floor, its various scenes languishing in disarray, trying to reassemble itself, but it can't. Not without your help. But you're not helping.

Why is that, Terence?
posted by eclectist at 9:48 AM on May 25, 2016 [24 favorites]

This quite accurately reproduces the bygone experience of stumbling across a movie on a CRT television late at night when you were half-asleep and the reception wasn't great.
posted by The Card Cheat at 9:48 AM on May 25, 2016 [5 favorites]

The human brain activity clip resembles (to me, at least) the dream sequences from Wim Wenders' Until the End of the World. I love that film for so very many reasons.
posted by curiousgene at 9:53 AM on May 25, 2016 [6 favorites]

How much memory do the models take up? Obviously it's not going to replace H.264 anytime soon, but I wonder what the compression rate looks like.
posted by jedicus at 9:53 AM on May 25, 2016 [3 favorites]

jedicus: Poor. Autoencoder is not really for compression, in the same way that L1 regularization will not really compress things too much: it's much more for regularization (better performance on the model error, not having a tinny bit worse performance on the error while having lots better performance on clock time). Optimal brain damage, optimal brain surgery, the tensor encoding methods, FastFood (and the related unhealthy-food-themed compression algorithms) and the various ACDC-like SELLs are better. Probably best is the Han et al work on shoving things into CSR matrices but read the ACDC paper's critique of that.
posted by hleehowon at 10:01 AM on May 25, 2016 [9 favorites]

"The film lays on the cutting room floor"

Film? What's that?
posted by I-baLL at 10:01 AM on May 25, 2016 [8 favorites]

(FastFood is different from the others, the ACDC paper talks about it... actually, they're all subtly different, ain't it great)

Also empirical is the statement that the tiny little subfield of neural networks that is compressing the shit out of them is the one most amenable to naming things weird-ass names. It's great.
posted by hleehowon at 10:02 AM on May 25, 2016 [1 favorite]

> Film? What's that?

Know what a movie is? Same thing.
posted by glonous keming at 10:12 AM on May 25, 2016 [26 favorites]

That looks like "Bladerunner run through Scarecrow's Fear Gas Filter."
posted by lagomorphius at 10:17 AM on May 25, 2016 [4 favorites]

I-baLL: ""The film lays on the cutting room floor"

Film? What's that?"

Scratch your front teeth. Now look at your fingernail.
posted by Splunge at 10:25 AM on May 25, 2016

I really hate being too stupid to understand these things.
posted by aramaic at 10:44 AM on May 25, 2016

I really hate being too stupid to understand these things.

Yeah -- I have been following the links and wikipediaing things in the hopes of finding an ELI5 level overview of this.

From what I understand, he built tools that, given a frame of the movie can "reconstruct it." He then asked them to reconstruct every frame.

But I don't quite get what "reconstructing" consists of. There's a generator and a discriminator, but does the generator just start with random images of noise until it finds something that the discriminator likes? Does the discriminator just compare pixel-by-pixel to the picture that it knows?
posted by sparklemotion at 11:08 AM on May 25, 2016

the tensor encoding methods

Do what now?
posted by The Tensor at 11:13 AM on May 25, 2016

T̹̤͔ͧ͡͞H̴̤͍̙͓͎͐̚̚I̵̘̯̦̙̹̖̪ͣ͒ͣ̋S̘̙̯̥̹̒̎ͮ͢ ̴̡͔̩͕̩͔̗͎̋̔̇ͪͫ̎̀I̎̑ͫ͂́̉̓͏̭̘̥̳Ş͔͈͈̩̯̟̭̰͍ͮͨ̐̏̅̿͑̕ ͍̣̠͍̦̂R̤̻̭̰͉̭͇͂ͬ͒͂E̛͎̞̼̹̍ͭ̍ͧ̇̒Ḷ̟̦̟̲̽ͦͧ̾͛ͫ̏̚͟͝E̱̪̗̬̠͙ͨ͂V̙̟̋̒͒̿́͝Ả̰̟̪̫̺͔̠͗̔̓͝Ṅ̨͖͍͙̖̹̞͉ͦ̈́̔͘͡ͅT̷͈̄͒͒̑͜ ̴̜̝̦͎̓ͭT̪̲͎͇̺̤̱͎̑̓̓̕ͅȮ̸̳̳͚͖̥̂̈́̑ͨ̒̄̿ ̛̫̮̽̇̓̏̒ͥ̽ͫ͘͜M̸̶̛̞̰ͤ̎ͣ̈́ͣͣ̚Y̧̹̲̳̱͖̰͈̠̎͋̆̀ ̴̨͇̩ͣͦ͌̑̂I͎̗̰̬͓̠ͥ͋ͣͧͧ́̚͘N̖͚̟͖̟̳̣͔̊̒̀Ṱ̛̳̹͖̜̜͖ͨͨ̀ͯ͌̉E̥̖̣̪̺̯̬ͦ͊ͦ͗̔ͪͧ̏ͮṚ̶̟̬̲̳̣͊ͭ̏E͓̺͔͇͕̥͕̎͛͆ͫ̒ͪͅS̳̮̠ͭ̽̽͗͐͞T̛͇̭̔͐S̨̼̭͇̒̎ͥͬͧ̎ͮͬ́
posted by Johnny Wallflower at 11:23 AM on May 25, 2016 [5 favorites]

Obviously it's not going to replace H.264 anytime soon, but I wonder what the compression rate looks like.

But imagine if you could compress every film in the same net?
posted by JoeZydeco at 11:26 AM on May 25, 2016

My Moviola? What about my Moviola?
posted by rhizome at 11:41 AM on May 25, 2016 [1 favorite]

But imagine if you could compress every film in the same net?

Technically every movie is contained inside of pi.
posted by Candleman at 11:45 AM on May 25, 2016 [1 favorite]

But I don't quite get what "reconstructing" consists of. There's a generator and a discriminator, but does the generator just start with random images of noise until it finds something that the discriminator likes? Does the discriminator just compare pixel-by-pixel to the picture that it knows?

I'm not clear on this either, especially when he starts applying the BR model to other films. It looks like this is a decent explanation of what's going on:

Generator and Discriminator consist of Deconvolutional Network (DNN) and Convolutional Neural Network (CNN). CNN is a neural network which encodes the hundreds of pixels of an image into a vector of small dimensions (z) which is a summary of the image. DNN is a network that learns filters to recover the original image from z.

When a real image is given, Discriminator should output 1 or 0 for whether the image was generated from Generator. In the contrast, Generator generates an image from z, which follows a Gaussian Distribution, and tries to figure out the distribution of human images from z. In this way, a Generator tries to cheat Discriminator into making a wrong decision.

posted by neckro23 at 12:07 PM on May 25, 2016 [2 favorites]

Memories... You're talking about memories!
posted by likethemagician at 12:11 PM on May 25, 2016 [3 favorites]

Multiple use of the same network for different tasks (which the differential compression of things would end up being) is primarily thought about in the small echo state network literature (ESN is a recurrent network; the feedforward equivalent is the extreme learning net). There, they just use the input-to-hidden bit as a projection into a bigger space, and the recurrent bit as a way to surface learnable features of the task. That is, the input-to-hidden weights and the hidden-to-hidden weights are not learned. So the task ends up being a (usually linear or logistic regression (depending on the criterion).

But this is, basically, as far as it can get from what the VAE is doing, where there is no difference between the input and output so the learned hidden layer is the only useful thing made. And empirically, ESN needs a lot more hidden units (which are, admittedly, way easier to make) for the same task than a normal RNN or a gated LSTM. So this is highly specific to the task as it is, currently.

I'd like to poke about at my own hobbyhorse here and note that nothing about neural networks is ever gaussian except the initializations that people use. Measure some kurtosises.
posted by hleehowon at 12:12 PM on May 25, 2016

From my understanding, people have been working on neural networks that can be trained to produce realistic images when fed source images. There was one picture of faces showing how the original face and then similar but clearly different faces generated by networks. They also did this with pictures of bedrooms, producing new bedroom images from source files that were good enough to fool a fake bedroom detecting network.

So now he's applied this system to a movie, taking each frame of image and feeding it into the network that produces a similar but unique image from it.

Why does this matter? It's improving our ability to make realistic computer simulations of reality, and it's improving the ability of computers to interpret reality in meaningful ways.

A while back there was an site posted that would take one image and apply it to another to create stylistic blends, like a picture of a paint splash plus a picture of a person becoming a person made out of paint splash. This feels like a somewhat similar concept. I imagine someday you could have a neural net trained on A Scanner Darkly that could then automatically apply the style of that movie to a completely different movie.
posted by Mr.Encyclopedia at 12:15 PM on May 25, 2016 [2 favorites]

"A while back there was an site posted that would take one image and apply it to another to create stylistic blends, like a picture of a paint splash plus a picture of a person becoming a person made out of paint splash. This feels like a somewhat similar concept."

He links to a recent University of Freiburg video that does that with moving images. It stunned me, it's just amazing.
posted by Ivan Fyodorovich at 1:11 PM on May 25, 2016 [8 favorites]

What about EMS-3 recombination?
posted by Naberius at 1:21 PM on May 25, 2016 [1 favorite]

I almost put that U of Freiburg video in the FPP. It's pretty mind-boggling.
posted by neckro23 at 2:12 PM on May 25, 2016 [1 favorite]

Technically we don't know that every movie is contained inside of pi as a contiguous sequence of bits.

Even if that were true it is far too likely that it would take more bits to locate the beginning of the movie than to just store all the bits in the movie.
posted by Death and Gravity at 2:18 PM on May 25, 2016 [1 favorite]

We've already tried it - ethyl, methane, sulfinate as an alkylating agent and potent mutagen; it created a virus so lethal the subject was dead before it even left the table.
posted by porpoise at 2:28 PM on May 25, 2016 [3 favorites]

Even if that were true it is far too likely that it would take more bits to locate the beginning of the movie than to just store all the bits in the movie.

That is, indeed, the joke.
posted by Candleman at 5:41 PM on May 25, 2016 [1 favorite]

Technically every movie is contained inside of pi.

11:15, restate my assumptions: 1. Mathematics is the language of nature. 2. Everything around us can be represented and understood through numbers. 3. If you graph these numbers, patterns emerge. Therefore: There are patterns everywhere in nature.
posted by radwolf76 at 7:33 PM on May 25, 2016 [5 favorites]

Ivan Fyodorovich: that University of Freiburg video really IS amazing. Thanks for posting. #mindblown
posted by ephemerae at 8:26 AM on June 1, 2016

« Older The Norwegian Katzenjammer Kids | boosting safety, convenience, & feasibility of... Newer »

This thread has been archived and is closed to new comments

MetaFilter

I’ve seen things you people wouldn’t believe
May 25, 2016 9:25 AM Subscribe

Tags

Share

I’ve seen things you people wouldn’t believe May 25, 2016 9:25 AM Subscribe

Tags

Share

I’ve seen things you people wouldn’t believe
May 25, 2016 9:25 AM Subscribe