Join 3,512 readers in helping fund MetaFilter (Hide)


Video of Lenin without so much Trotsky, please...
August 15, 2008 9:49 AM   Subscribe

Enhancing video using photos "Using Photographs to Enhance Videos of a Static Scene" sounds a bit dry, but watch the demo video. Not only are exposures correctable, resolution can be enhanced enough to do a flawless digital zoom in post, and objects can be undetectably changed or removed from shaky handheld video. This is amazingly cool for video people, but also turns the slippery slope of "Can I trust what I see?" into a gaping chasm.
posted by lothar (43 comments total) 29 users marked this as a favorite

 
This is awesome. I think you could also use this as a kind of compression, right? Save details from a few frames and then reduce the resolution/dynamic range of the rest of the footage. Reconstruct on the fly.

I'd like to know more about the "novel stereo distance algorithm" or whatever they called it. Like, why is a moving camera with a still scene easier than a still camera and moving scene? (Oh wait--it's because they only need one "motion solution" for the former but multiple ones in the latter.)

But if you are believing anything you see on video now....I pity da foo.
posted by DU at 10:06 AM on August 15, 2008


Any technology whose flowchart includes a "SPACETIME FUSION" box is automatically awesome.
posted by odinsdream at 10:08 AM on August 15, 2008 [2 favorites]


This is cool! I love the narrative that goes along with the video. I have no idea what they are talking about, but it sounds so . . . scientific. All space-time fusion algorithms, etc. It reminds me of that scene in many science fiction movies where some character is brought in to explain exactly how hyperdrive (or whatever) works.
posted by ferdydurke at 10:09 AM on August 15, 2008 [1 favorite]


Wow, the really awesome part (and I mean, compared to all the other awesome), is the last item where the No Parking sign is removed. I believe this is relying on the fact that whatever is behind the parking sign is available before or after the current video position. Surely that would require some kind of non-automated keying, though. It's a very impressive result.
posted by odinsdream at 10:15 AM on August 15, 2008


Yeah, I didn't get the No Parking example. How did they do that with only a single mask when the profile of the sign changes depending on viewpoint? Or did they use the mask to identify an object and then automatically mask out that object throughout?
posted by DU at 10:17 AM on August 15, 2008


That's some pretty neat stuff. Anytime someone can take a stupid scifi trope and find a workaround to de-stupidfy it, I'm impressed.

*shouts "enhance!" at mefi comment box*
posted by cortex at 10:22 AM on August 15, 2008 [2 favorites]


I wonder how many videos they made that came out crummy? Usually these techniques work in a fairly narrow set of scenes.
posted by smackfu at 10:26 AM on August 15, 2008


Needs the holyshit tag.
posted by preparat at 10:27 AM on August 15, 2008


I'm imagining a videographer running around with a still photographer in tow. Gotta get high-res still photos of everything so you can punch-in corrections later.

Combining a cheap video camera with a consumer-grade digital camera and this software to produce high-res, high-quality video is potentially a game changer. For some scenes you won't need expensive video gear -- just throw computation at the problem. Very cool.
posted by sdodd at 10:33 AM on August 15, 2008 [2 favorites]


I can see why they need the scene to be generally static for this to work -- because of the modeling that's got to go on to regenerate the video -- but I wonder for how long that'll be a constraint if they find a way to impose externally-created 3D models into the scene. Or even better, extract an arbitrary object in the scene (say, one of the picture frames), edit the model (so it's now a slice of brie), and then put that in its place.
posted by ardgedee at 10:35 AM on August 15, 2008


That's really incredible. It seems clear to me that this technique hits limitations (for now) when you have moving material in the shot (for example, a jogger running across frame) but what they've managed to accomplish here, especially the exposure correction and resolution improvement is just stupefyingly cool. I kind of want these guys to work our next tournament and punch up all our floor footage.
posted by shmegegge at 10:35 AM on August 15, 2008


Did anyone else notice that there were definitely moments where you could tell they had applied this effect to it? The zoom of the statue face, for example - there were points where it looked like the statue wasn't moving completely in sync with the rest of the scene, and I found it a little disorienting once I noticed it.

(I notice the same thing when watching HD shows on digital cable too - when times when it seems the compression causes objects to not move quite the same during motion or pans.)
posted by evilangela at 10:37 AM on August 15, 2008


How did they do that with only a single mask when the profile of the sign changes depending on viewpoint? Or did they use the mask to identify an object and then automatically mask out that object throughout?

I could be interpreting what they're doing incorrectly, but when you create a depth map and project imagery with a known xyz capture source and a projected texture, it's fairly simple to mask out anything based not on its texture but rather its physical location. When rendering, it would be masked out and the frames fill the gaps based on an interpolated camera location.
posted by jimmythefish at 10:40 AM on August 15, 2008


sdodd writes "I'm imagining a videographer running around with a still photographer in tow. Gotta get high-res still photos of everything so you can punch-in corrections later. "Combining a cheap video camera with a consumer-grade digital camera and this software to produce high-res, high-quality video is potentially a game changer. For some scenes you won't need expensive video gear -- just throw computation at the problem. Very cool."

My IS3 can do this kind of capture already. You can take video and at the same time press the shutter for a still image. There is a slight pause in the recording but I'm bet that wouldn't be a serious problem with motivation.
posted by Mitheral at 11:09 AM on August 15, 2008


Did anyone else notice that there were definitely moments where you could tell they had applied this effect to it? The zoom of the statue face, for example - there were points where it looked like the statue wasn't moving completely in sync with the rest of the scene, and I found it a little disorienting once I noticed it.

I noticed this too, fwiw. The face seemed to 'face' the camera even as it rotated around.

Nonetheless, it is awesome. And every application needs a spacetime fusion algorithm.

"Jenkins, look at this spreadsheet. These quarterly numbers don't add up! The shareholders will kill us!"

"I have just the thing." Clicks Insert>Formula>Spacetime Fusion

"Next years profits booked today! Well done, Jenkins!"
posted by Pastabagel at 11:22 AM on August 15, 2008 [1 favorite]


pretty soon, movies will all be like Waking Life -- except they won't look like animated cartoons. Actors will be totally disposable, and movies will just need Pixar-rendering farms for directors to do their editing. I'm looking forward to the first "uncanny valley" feature-length film.
posted by mhh5 at 11:23 AM on August 15, 2008


In the future I can see a little camera like my Canon PowerShot taking amateur quality video and automatically sampling with high quality digital stills. The advantage would be that the stills would have exactly the same lighting, timing, and point of view as the video, and could even capture moving subjects. Software could tell the camera the optimal moments to take the stills. The captured data would then be run thorough this software for professional quality video results.
posted by weapons-grade pandemonium at 11:42 AM on August 15, 2008


pretty soon, movies will all be like Waking Life -- except they won't look like animated cartoons. Actors will be totally disposable, and movies will just need Pixar-rendering farms for directors to do their editing. I'm looking forward to the first "uncanny valley" feature-length film.

Too late.
posted by sixswitch at 11:43 AM on August 15, 2008


Actually I think Polar Express was the first one. They hired Tom Hanks, motion-captured him, and it ended up creepy.
posted by smackfu at 12:04 PM on August 15, 2008 [1 favorite]


The video thing is interesting, but given that you have to have a static scene, it's probably of little use for actual movie-making. But the automatic depth-mapping thing is quite nice. If you could efficiently and automatically generate those, it would simple to apply narrow depth of field as a post-processing effect, rather than a lens characteristic. (Yeah, this can already be done, but it's time-consuming.) Consistently wide depth of field is one of the annoying characteristics of compact still cameras.
posted by echo target at 12:05 PM on August 15, 2008


Also, I seriously doubt that actors will ever be disposable. Even disregarding aesthetics, it's almost always cheaper to hire an actor to stand in front of the camera and do something than it is to hire a team of modelers, animators, lighters, and texturers. Low-budget feature-length animated movies are few and far between, because animation is expensive.
posted by echo target at 12:10 PM on August 15, 2008


I'm guessing that the static-scene requirement is because their algorithm isn't good at isolating elements that are in motion. If they could lick that, they could probably use the "subtract the no-parking sign" trick to subtract elements in motion and work out the static background.

I'm also guessing that eventually, someone will solve that problem and be able to apply these techniques to more dynamic scenes. If they could isolate moving elements, they could reconstitute the static backgrounds and even transpose the moving elements into different scenes.

But wait! Because part of what this is doing is creating a 3D model from 2D images, that could be combined with some of the special effects techniques pioneered for Matrix Reloaded to actually change how these figures are animated.

Oh, it's going to be fun. Someday, when someone asks you "how did Darth Vader wind up in Anne of Green Gables?" you'll know the answer.
posted by adamrice at 12:29 PM on August 15, 2008


Voodoo (semi-)automatically builds models using feature tracking from video. (via an article behind a paywall, sorry.)
posted by Pronoiac at 1:20 PM on August 15, 2008


My IS3 can do this kind of capture already. You can take video and at the same time press the shutter for a still image. There is a slight pause in the recording but I'm bet that wouldn't be a serious problem with motivation.

The one problem with that, as far as this technique is concerned, would be that your still frames taken with your video camera will still have the same exposure, detail, etc... as your video frames, yes? They might have better compression (especially if the camera shoots minidv or hdv for the motion footage) but otherwise they would basically look like stills from the video footage, right? Or am I wrong about this?
posted by shmegegge at 1:25 PM on August 15, 2008


Well, its always been true that the camera lies. It used to be massively more expensive to produce a very well done lying image, and lying video was even more expensive, but its always been possible. All this sort of thing does is remove that power from the hands of just a few elites and put it into the possession of a vastly larger pool of people.

A few years from now you'll be able to use freely available software to produce perfect video forgeries of absolutely anything [1]. While its never been true that we could really trust single source videos or pix, that illusion has comforted many people. "The camera never lies" has always been nonsense. We have to abandon that illusion and find a new way forward. Which, fortunately, is relatively easy. Just as technology is exposing the nonsense behind the camera never lies illusion, so too it provides the solution.

"How will we know what really happens?" is the same as its always been: multiple independent verifications. If there'd been fifty, or five hundred, people filming JFK's assassination, for example, no serious questions as to the validity of the tapes would be raised. The rise of cell phone cameras, and especially cell phone video cameras, in ever increasing megapixelage is a good step in providing us with the vast number of viewers necessary to verify events.

shmegegge: Even if the current generation of video cameras don't actually shoot stills at higher resolution than they shoot video (or can't do it concurrently), if this sort of tech takes off you know they'll start making video cameras that include a separate, higher resolution still camera, along with an algorithm that automatically takes the needed high resolution stills at the optimin times for the process.

[1] Before the public really absorbs the impact of this, I predict multiple celebrity scandals and politicians brought low by videos "proving" they did X. One does wonder what that will do to the porn industry though.
posted by sotonohito at 1:52 PM on August 15, 2008


If they had an actual 3d camera, i'd think the limitations on motion would disappear.
posted by empath at 3:29 PM on August 15, 2008


And if they had a 4d camera the whole spacetime fusion thing would be that much cooler.
posted by aubilenon at 4:01 PM on August 15, 2008 [1 favorite]


shmegegge writes "The one problem with that, as far as this technique is concerned, would be that your still frames taken with your video camera will still have the same exposure, detail, etc... as your video frames, yes? They might have better compression (especially if the camera shoots minidv or hdv for the motion footage) but otherwise they would basically look like stills from the video footage, right? Or am I wrong about this?"

Actually the stills are taken at full 6MP resolution compared the relatively poor 640X480 of the video. I imagine it would be possible to expose it differently though this camera doesn't allow for that. And as I mentioned there is a tiny break in the video recording when the image is snapped. However this is strictly a problem with the internal processor, physically there isn't a need for it.
posted by Mitheral at 4:24 PM on August 15, 2008


DU writes "Yeah, I didn't get the No Parking example. How did they do that with only a single mask when the profile of the sign changes depending on viewpoint?"

They (presumably) rotated the masked image (sampling through various increments on all three axes of rotation and all three axes of movement), then whenever that matched the recorded image, got rid of it.

It's filling in what the sign covered that's hard. Given that you know the camera's movement and rotation (because the shape of the sign matches one of your test rotations of the masked image), you know where the camera is pointing, but you stillhave tointerpolate the clipped region from the film's other frames.
posted by orthogonality at 6:31 PM on August 15, 2008


Actors will be totally disposable

Nah, it'll just make the great actors shine, 'cause they're pretty unique in their abilities.
posted by Brandon Blatcher at 6:37 PM on August 15, 2008


But if you are believing anything you see on video now....I pity da foo

Including this video. There are several things about this that just don't make sense. And the whiz-bang-ee names are setting off the bullshit detector. Sounds like a page out of the emperor's new clothes to me. Adobe indeed. Might as well set the sights high and name-dropping lends gravitas.
posted by spock at 9:31 PM on August 15, 2008


That's pretty neat.

I often use Boujou's ability to generate a point cloud from a camera move over a scene to hand model (not automatically model, like these guys) objects suitable for turning 2D photos into 3D using camera projection techniques.

It came in handy, for example to turn a Summer road into a very icy, slushy one in the movie Taking Lives.

There is a lot of nice automation here, though. I look forward to seeing it develop
posted by jfrancis at 11:45 PM on August 15, 2008


If they had an actual 3d camera, i'd think the limitations on motion would disappear.

I'm not sure. This software, like Boujou, uses the many frames in a camera move like 'eyes' in a stereo system, only instead of a pair of eyes there are as many viewpoints as their are frames in the motion.
posted by jfrancis at 11:47 PM on August 15, 2008


However this is strictly a problem with the internal processor, physically there isn't a need for it.

Yes it is. The way charge is read off the CCD is physically different for video and stills, which is why the stills look better.

(it's also why the viewfinder goes blank on most cameras when you take a picture)
posted by cillit bang at 5:02 AM on August 16, 2008


Must be SIGGRAPH time again! Still, I think it's going to be difficult to improve on Seam Carving for Content Aware Image Resizing.
posted by alby at 5:02 AM on August 16, 2008


For those of you wondering how the 'no parking' sign removal was done with just one single mask, check out this video of the same research team's rotoscoping tool.
posted by egypturnash at 6:59 AM on August 16, 2008


So, I'm wondering why they haven't invented cameras yet that act as radar and interpolate actual distances per pixel. In fact, you could create a 3D camera with two lenses, each of which sends out a radar signal to map the distance per pixel in each lens. Then stuff like this would be effortless and super accurate.

Maybe my understanding of radar is incomplete, but I thought we had already figured out how to help blind people see by using radar and translating it into nerve impulses on the skin or tongue...

I know they have lasers that can scan 3D objects, but I presume you'd have to project an invisible laser beam that wouldn't harm anyone's vision or sensitive electronic devices, such as cameras.
posted by PigAlien at 8:20 AM on August 16, 2008


PigAlien: Such things do exist, they're called flash LADAR (like radar, but with lasers) sensors. These guys sell them, and showed them off at a Google Tech Talk a couple years ago. The main problem is cost; apparently prices have dropped significantly in the last few years, but a sensor still costs thousands of dollars and sucks huge amounts of electricity.
posted by teraflop at 11:16 AM on August 16, 2008


Low-budget feature-length animated movies are few and far between, because animation is expensive.

Yes and no. For features, where quality control and "wow-factor" are key, yeah, maybe. On TV, not so much. There are many reasons why most of FOX's longest-running programs are cartoons, not the least of which is that they're really frigging cheap. One reason for this: Koreans! Animation can be outsourced, whereas live-action filming involves sound stages and lights and trailers and makeup and donuts and coverage and five teamsters for every sandbag.

Anyway.

I'm not really sure if this video "enhancing" technology is awesome or scary. Either way, I can't wait to see the "enhanced" Zapruder film and the "enhanced" 9/11 footage.
posted by Sys Rq at 1:00 PM on August 16, 2008


Either way, I can't wait to see the "enhanced" Zapruder film and the "enhanced" 9/11 footage.

Part of the way there already.

posted by penduluum at 3:41 PM on August 16, 2008


Thanks, teraflop, that concept was obviously teasing my subconscious memory, as I'm sure I hadn't come up with the idea myself. Not that that would be difficult, its fairly obvious! So, in a sense, this software is a way to achieve the same thing with less expensive equipment, but as costs fall, the hardware and software will combine for great potential.
posted by PigAlien at 6:16 PM on August 16, 2008


For those that are curious on the magic behind this video, it stems from the multi-view stereo part that is briefly mentioned in the video. The multi-view stereo, as others have mentioned here, is simply a way of computing a unique world coordinate for each pixel in each image. In other words, you could in theory assign a GPS coordinate to each pixel, as you know its position in an absolute sense, rather than just as a specific pixel in a specific image.

This same stereo technique is also a major part of the magic behind photosynth, the Microsoft demo you may remember from some time back, which was created by both Microsoft and the Univ of Washington, which is the source of this paper as well.

The techniques of applying super-resolution, color compensation, sign removal, etc are all fairly simple operations that are well known in the vision community. The authors are simply showing a "roll call" of the types of things that can be done more easily and accurately once you have the multi-view stereo computed. I have yet to read the full paper, but in the abstract and Introduction the authors tell that the contribution of this paper is an improvement they made in the multi-view stereo, and they arent doing anything new with the photo and video editing techniques demonstrated in this video.
posted by LoopyG at 8:29 AM on August 18, 2008


I thought PigAlien was thinking of setups that use video cameras & translate them into tactile sensations. (Unless he's thinking of Daredevil. Or the recent Radiohead video.)

I was going to link one, & instead found something using stereo video cameras to convey depth, like radar (pdf). And something translating pictures into audio.
posted by Pronoiac at 9:00 AM on August 18, 2008


« Older Blocky 2....  |  Nils Olav has been a member of... Newer »


This thread has been archived and is closed to new comments