Face the future
March 19, 2016 7:15 PM Subscribe

Face2Face: Real-time Face Capture and Reenactment of RGB Videos We present a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. - Stanford Computer Graphics Laboratory
posted by CrystalDave (31 comments total) 13 users marked this as a favorite

(Their previous video was literally everywhere last year, so I'm a bit surprised that it doesn't seem to have been posted to MetaFilter.)
posted by effbot at 7:44 PM on March 19, 2016

The political monkeywrenching possibilities are endless.
posted by Thorzdad at 8:08 PM on March 19, 2016 [7 favorites]

I reckon this is the same project, no?
posted by juv3nal at 8:13 PM on March 19, 2016 [1 favorite]

Does anyone know what "RGB" stands for in this context? It could be Red Green Blue but I suspect it's something else here, and Google isn't telling me.
posted by intermod at 8:17 PM on March 19, 2016

From "A Scandal in Bohemia," a Sherlock Holmes story:

“Let me see!” said Holmes. “Hum! Born in New Jersey in the year 1858. Contralto—hum! La Scala, hum! Prima donna Imperial Opera of Warsaw—yes! Retired from operatic stage—ha! Living in London—quite so! Your Majesty, as I understand, became entangled with this young person, wrote her some compromising letters, and is now desirous of getting those letters back.”
“Precisely so. But how—”
“Was there a secret marriage?”
“None.”
“No legal papers or certificates?”
“None.”
“Then I fail to follow your Majesty. If this young person should produce her letters for blackmailing or other purposes, how is she to prove their authenticity?”
“There is the writing.”
“Pooh, pooh! Forgery.”
“My private note-paper.”
“Stolen.”
“My own seal.”
“Imitated.”
“My photograph.”
“Bought.”
“We were both in the photograph.”
“Oh, dear! That is very bad! Your Majesty has indeed committed an indiscretion.”

How times have changed.
posted by mccarty.tim at 8:31 PM on March 19, 2016 [7 favorites]

intermod, they're just saying that they don't need any other ancillary data (like depth cameras, motion capture dots, 3d scans, or the like) - they are operating entirely in 2d, RGB video space for both video inputs as opposed to the previous demo which required RGB-D cameras. So they can download two YouTube videos and merge them this way.
posted by town of cats at 8:32 PM on March 19, 2016 [4 favorites]

I suspect it means that, hey, we don't need anything other than conventional video. No capture animation from commercial rigs with a bunch of dots plastered over someone's face or anything. If it's captured as video, we can mess with it, like we were in a real-world version of Michael Crichton's Rising Sun, 20 years late.
posted by figurant at 8:32 PM on March 19, 2016 [1 favorite]

So. I can make anyone say anything I want? I vaguely remember somebody somewhere had a way to chop up previous speech from a person and then reconstruct new speech from the pieces. Coupled with facial expression control this could be quite handy or deadly as the case may be.
posted by njohnson23 at 9:05 PM on March 19, 2016

For some context, there's been similar demos with specialty cameras like the Microsoft Kinect.

The Kinect projects a grid pattern in infrared, and an infrared camera measures the grid's distortion. That's how it can get depth perception and more detail to map out a face or really any other object/scene. RGB footage means it's just footage from a regular camera, like for TV or Youtube.
posted by mccarty.tim at 9:11 PM on March 19, 2016 [1 favorite]

A boon for film dubbing (we'll be able now to have any actor speak any language down to the lip and tongue movements).

A bane for truth. Watch the bits of the video with Trump in them. Anyone can now point at this video and claim that Trump never said any of those outrageous things, that any video evidence was manipulated.

A boon for truthiness.
posted by kandinski at 9:38 PM on March 19, 2016 [1 favorite]

The political monkeywrenching possibilities are endless.

Okay, Trump... I'm visualizing altered video where he's calmly talking about nuanced policy stances that are well informed and quite reasonable. (I mean if you made him say bat-shit-insane stuff people would just assume that he really said them).

Next step: audio.
posted by el io at 10:01 PM on March 19, 2016

A boon for film dubbing

Yup, similar technology, but not real-time, is already being used in Hollywood for dubbed foreign language versioning.
posted by praiseb at 10:10 PM on March 19, 2016

Kinda surprised they'd work in RGB and not Y'CbCr, given that's the source and eventual destination colorspace.
posted by Rhomboid at 11:49 PM on March 19, 2016

Yup, similar technology, but not real-time, is already being used in Hollywood for dubbed foreign language versioning.

Can you give a link or help me find more out about this? Curiosity piqued!
posted by Literaryhero at 1:22 AM on March 20, 2016

Photo and photo-realistic imagery of any sort will soon, it seems, not really be evidence of anything. In some ways, it would be very freeing. "That's not my sex tape." "I never said those things." Etc.

The dangerous time is now, between the society-wide recognition that everything can easily be manipulated and old expectations of the immutability of photos, video and the like.
posted by maxwelton at 2:19 AM on March 20, 2016 [5 favorites]

Five quatloos for a video of Trump saying "I am a poopy pants".

Who am I kidding, as if it would change anyone's mind.
posted by Joe in Australia at 4:01 AM on March 20, 2016 [2 favorites]

Photo and photo-realistic imagery of any sort will soon, it seems, not really be evidence of anything. In some ways, it would be very freeing. "That's not my sex tape." "I never said those things." Etc.

People need computers for that? Top Guardian headline right now: "Trump campaign denies manager grabbed protester at Arizona rally / Video appears to show Corey Lewandowski pulling the collar of protester, but spokesperson denies physical involvement and blames third person."
posted by effbot at 4:38 AM on March 20, 2016

It's not so much that this can be done, but that it can trivially be done. Between the new understanding of the uselessness of eyewitness testimony and the widespread ability to make video and photographic information 'lie' convincingly in real time, the words "real" and "truth" are about to be permanently encased in scare quotes.
posted by Mooski at 4:40 AM on March 20, 2016 [1 favorite]

A bane for truth. Watch the bits of the video with Trump in them. Anyone can now point at this video and claim that Trump never said any of those outrageous things, that any video evidence was manipulated.

Exactly. The existence of this tech is as threatening as its application. Reminds me of that letter the VFX guy from The Matrix sent to then President Clinton warning of this inevitability.
posted by butterstick at 7:28 AM on March 20, 2016

Does anyone know what "RGB" stands for in this context? It could be Red Green Blue but I suspect it's something else here, and Google isn't telling me.

Ruth "Gator" Binsburg, which is Ruth Bader Ginsburg's pro wrestling alias.
posted by Strange Interlude at 7:38 AM on March 20, 2016 [5 favorites]

One thing I noticed: the clips of the three "source" actors -- Bush, Putin, Trump -- all seem to be carefully-chosen short forward-then-reverse loops in which the person isn't talking and has a fairly neutral expression. (This is most obvious in the Bush clip, in which you can see the CNN crawl run first forward then backwards.) Probably this currently works best when the source is a fairly blank canvas?
posted by We had a deal, Kyle at 8:55 AM on March 20, 2016

Yeah, I'm wondering how well it hijacks a talking face.
posted by Johnny Wallflower at 9:33 AM on March 20, 2016

Just a matter of time, I suppose.
posted by Johnny Wallflower at 9:34 AM on March 20, 2016

Probably this currently works best when the source is a fairly blank canvas?

I think you just described Carl Rove's entire strategy behind the Bush II presidency to a tee.
posted by Insert Clever Name Here at 11:38 AM on March 20, 2016 [1 favorite]

Literaryhero: here you go.
posted by praiseb at 1:30 PM on March 20, 2016 [2 favorites]

Ha ha.

Ha

help

This is mana from heaven for conspiracy theorists!

You could do some next level fraud or fish dogging or whatever.

Next up, an app or website that allows you to send a personalised message from the mouth of some authority figure. Such as, you're the best parent in the world! Or, something less lovely.
posted by asok at 3:48 PM on March 20, 2016 [2 favorites]

WHADK, the video says that they were actually taking real video with the "source" actors actually speaking and emoting normally, and automatically correcting this video to a neutral expression in real time. So no, you don't need to cherry pick any kind of special video segment... you can take a video where someone said one particular thing and replace their words and expressions with whatever you'd like!
posted by Nutri-Matic Drinks Synthesizer at 4:31 PM on March 20, 2016 [1 favorite]

Best guess from a nerd (me): In this context they're saying "RGB video" to contrast with "RGB D" video, which is video with a Depth channel (basically what you get from the Microsoft Kinect camera-sensor, once you combine the two parts). So read it as "regular digital video"

That's important because it means you need only the live video of the victim, and a well-lit normal-video of the actor who is to take over control of their face. No need for fancier depth-camera equipment.

My mind always turns to the edited-video part of The Running Man when stuff like this comes round.
posted by BuxtonTheRed at 4:57 PM on March 20, 2016 [1 favorite]

It's bad enough now, with people being tricked into transferring funds or emailing a company's W2 forms to theboss@not.quite.my.company, just wait until the head a finance gets a "video voicemail" from the CEO instructing him to wire money. Later on, it will turn out the video was source from a CEO press release, and redubbed using technology like this.
posted by fings at 7:13 PM on March 20, 2016 [1 favorite]

Mulder was right. trust no one.
posted by TMezz at 11:50 PM on March 20, 2016

Imagine if we had videos of Jesus, people could put all sorts of words in his mouth.
posted by straight at 6:56 AM on March 21, 2016

« Older I try to see the beauty in everything. | Peeling apart dreams: Death of FP-100C Newer »

This thread has been archived and is closed to new comments

MetaFilter

Face the future
March 19, 2016 7:15 PM Subscribe

Tags

Share

*Face* the future March 19, 2016 7:15 PM Subscribe

Tags

Share

Face the future
March 19, 2016 7:15 PM Subscribe