Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial
April 5, 2017 10:37 AM   Subscribe

Berkeley's software turns paintings into photos, horses into zebras, and more

From the abstract:
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.

Our goal is to learn a mapping G: X Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.

Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc.
The future looks scary...
posted by bodywithoutorgans (14 comments total) 20 users marked this as a favorite
 
Man, this is crazy. My biggest question which wasn't answered by the paper is: is this approach inherently limited to images, or could it also be applied to text?

Right now, one of the limiting factors in machine translation is the need for human experts to construct a dataset with parallel versions of the same text in multiple languages. There are existing collections out there, such as documents from the United Nations, but they're not necessarily representative of all linguistic contexts. But if you don't need the input to be paired up sentence-by-sentence, you could just throw a much larger volume of text at it, from virtually any source.

Imagine how much more natural Google Translate could become if it was trained on, say, all of MetaFilter+Reddit+Tumblr.
posted by teraflop at 10:48 AM on April 5, 2017


Google Translate does already work on the principle of using artificial neural networks to create a discrete representation of each language and then aligning the languages via a small dictionary of synonymous content. Exploiting Similarities among Languages for Machine Translation
posted by ethansr at 10:54 AM on April 5, 2017 [1 favorite]


I do love that the standard for this sort of thing is to include installation instructions - I just spent a few hours this past week trying to get "Deep Photo Style Transfer" and "Neural Style" working on AWS, with mixed results.

Which is to say, most of my results wound up looking like these guys.

Odds are good I'll give this a go, too.
posted by Kikujiro's Summer at 11:11 AM on April 5, 2017 [3 favorites]


ethansr: Unless I'm misunderstanding it, that paper is only focused on translating individual words and set phrases. The big breakthroughs in recent years have revolved around using neural networks to actually represent grammatical structures, and as far as I know, that stuff still requires aligned corpora. But I'm not surprised that people are already working on solving this problem!
posted by teraflop at 11:38 AM on April 5, 2017


My biggest question which wasn't answered by the paper is: is this approach inherently limited to images, or could it also be applied to text?

Image-to-image problems have the advantage that edges remain basically unchanged from the source to the target image. That's what makes zebra-to-horse so hard, because a zebra is just a whole mess of edges, and if you take a look at the randomly selected test cases for that problem you'll see that by and large the edges are still there. But if the translation you want to do preserves almost all the edges, this technique seems to work well and is a big step forward!

Text-to-text is much harder, because the equivalent of "edges" is, I suppose, the logical relationship between parts of speech, and those don't stay constant between languages. I don't doubt that unsupervised learning approaches like these will have a role to play in the future in natural language processing, but I don't think anyone knows yet how to do that.
posted by Omission at 12:17 PM on April 5, 2017 [2 favorites]


Man, this is crazy. My biggest question which wasn't answered by the paper is: is this approach inherently limited to images, or could it also be applied to text?

If their approach to image processing could somehow be applied to text processing, they would sure as heck be talking about that in their description. Plus image processing is way way different than language processing, in like a million ways. for instance, half a picture is still a picture but half a sentence is (often) not a sentence at all.
posted by aubilenon at 12:25 PM on April 5, 2017


I just spent a few hours this past week trying to get "Deep Photo Style Transfer" and "Neural Style" working on AWS, with mixed results.

What size of instance? And is it worth shelling out for AWS-time, compared to running it on a home PC/Mac?
posted by acb at 2:44 PM on April 5, 2017


acb, I used the p2.xlarge instance which runs $0.90 an hour in US East. I picked it mostly because I wanted to work on as large an image as possible, and it has 12GB of memory in the GPU. That was sufficient to output an image ~1500px on it's largest side, if I was lucky (Using the cudnn backend. I don't pretend to understand half of this black magic).

If you're interested, there are a couple of community AMI's which already have the neural-style prerequisites installed, so you're ready to essentially upload your images and go. I was unable to get the deep photo style transfer to work, I think because I was missing some matlab libraries the script was using. Well, that or because I was using octave. I'm not sure.

The main advantage to using AWS is 1) I don't own a CUDA capable GPU and 2) the ones which have 10-12GB of memory are a bit pricey.

I'd only dabbled with AWS previously and was able to get this going in 1-2 hours, so trying out some of these techniques isn't too hard.
posted by Kikujiro's Summer at 5:16 PM on April 5, 2017 [2 favorites]


The season transfers are really impressive
posted by not_the_water at 6:34 PM on April 5, 2017


So, can you run "Deep Photo Style Transfer" and "Neural Style" on a five year old laptop and just let the CPU chug away at it for a little while, or do you need massive video cards and stuff if you don't want it to take months?
posted by sebastienbailard at 2:47 AM on April 6, 2017


I believe you can, though I last tried to do that maybe a year ago and wasn't too pleased with the results, though I don't remember why.
posted by Kikujiro's Summer at 5:39 AM on April 6, 2017


I'll add that you can get a p2.xlarge for 26 cents an hour under "spot pricing", the caveat being you can get kicked off (with two minutes' warning) if demand goes up. I originally was going to try the Google Cloud offering but for some reason I have zero GPU quota on there.
posted by Standard Orange at 9:57 PM on April 6, 2017 [1 favorite]


I'll add that you can get a p2.xlarge for 26 cents an hour under "spot pricing", the caveat being you can get kicked off (with two minutes' warning) if demand goes up.

Is there a way of quickly saving/restoring the state of neural network software that has been rigged up to capitalise on spot-priced instances?
posted by acb at 5:43 AM on April 8, 2017


I'm still green at this stuff, but AWS gives you something you can check for Imminent Shutdown status, and the Torch file for pix2pix lets you set how often you save the trained network. So I'm sure it's possible, though I haven't implemented it myself yet.
posted by Standard Orange at 10:40 PM on April 13, 2017


« Older Being alive means you have a lot of work to do.   |   Italian Style Newer »


This thread has been archived and is closed to new comments