Join 3,564 readers in helping fund MetaFilter (Hide)

It's like Minority Report!
November 9, 2006 3:13 PM   Subscribe

A Photosynth Tech Preview has been released from Microsoft Live Labs & University of Washington. While we discussed this project at length earlier this summer, it's now a public ( well, Win/IE only ) prototype with sample datasets, if you'd like to see it for yourself. Technical details for those interested.
posted by arialblack (28 comments total) 2 users marked this as a favorite

Looks great. Now, when do we get the damn Playtable interface for our PCs?!?!

Cue bitching about how this is no big accomplishment, yada yada yada
posted by hincandenza at 3:22 PM on November 9, 2006

I do like this post and I do wish to try out the application, so the following snarky message is expressly for Microsoft:

I'm sorry. As interested as I am in looking at this and playing with it, I'm not going to update IE to beta test your project for you. If Firefox or Opera aren't good enough for you, I'm not interested.

Considers setting his Firefox useragent to IE and seeing if it works anyway. Nope. Oh, hey. It also requires XP SP2 or Vista. Bastards!
posted by loquacious at 3:26 PM on November 9, 2006

OK, that's pretty cool. As I recall from their original demo reel they were going to have the entire space visible rather than this zany dot map...either I haven't found the option or they're working on it.

I use firefox but I'd use IE for this specific purpose.
posted by maxwelton at 3:43 PM on November 9, 2006

I don't buy it. I'm not of the "this is just QTVR" position contended in the previous thread, but there's no way they've got some kind of system that will generate a 3D model, unassisted, from a random collection of photographs. This has to require a whole bunch of massaging to achieve the final output.

I'd believe that they've created tools to dramatically speed up the process of creating those kinds of models and link together the photographs. But it can't just be automatic, otherwise it would be a radical breakthrough in computer vision and they'd have been pulled away long ago to work on far more lucrative applications of it.

Especially that "Grassi Mountain" collection. I don't think that a human could infer an elevation contour map from photographs, much less currently available computer vision systems. If this project has been going on for so long, and it's all fully automatic, why aren't there more photo sets that have been converted?
posted by XMLicious at 3:52 PM on November 9, 2006

Internet Explorer 6 and 7 Only

Up yours, Bill.
posted by ZenMasterThis at 4:07 PM on November 9, 2006

That 1% of graphics geeks that use Windows should truly appreciate this.
posted by dminor at 4:08 PM on November 9, 2006

I used FF with IEtab extension and it worked. This is interesting (esp the 3d aspect of it). Photoshop does auto-tile like this but not in 3d. (Now if google rolled their own into picasa...)
posted by creeptick at 4:11 PM on November 9, 2006

This is Gary Flake, the head of Live Labs and the guy on stage at Web 2.0 who gave the demo today.

I would like to explain why this is IE only. Photosynth requires 32 bit client side code. There is simply no way around that fact. We animate between photo transitions at a rate of up to 30 fps, so there is no AJAX or flash solution. Moreover, this same 32 bit code allows you to view ginormous photos off the web (see the art work in the Faigan gallery, some of which are 80 megapixel in size).

Given that we have 32 bit client code as a requirement, we could have gone two paths. The first would be to make this a standalone application (which, BTW, would still have to be Windows only because of the lowel level graphics code). If we went as an app (like Google Earth, for example), a lot of people would not run it because of reluctance to install unknown code.

Instead, we decided to go as an ActiveX plugin. This allowed us to have 32 bit code running with an extremely light weight install.

So on this issue: that's the whole point. It's not about IE versus FF (I am typing this on a FF browser now). It's about the easiest way to install 32 bit code.

Second, some have raised doubts that these collections are all algorithmically generated. I can assure you this this is the case. It does reflect a breakthrough in machine vision, and it was MSR and UW (Noah Snavely, Steve Seitz, and Rick Szieliski) who made the breakthrough and published at SIGGRAPH.

Finally, I want to emphasize that the Photosynth project hit its first milestone (with the release of the video) in August after only 4 months. Today's release to the public is 3 months later. So, while we would have loved to have made this available more broadly, we thought it best to release early, and release often.
posted by dr.flakenstein at 4:25 PM on November 9, 2006 [1 favorite]

Well, who am I to doubt an Erdös #3? In that case, congrats to MSR and UW for changing the world, and congrats to you Dr. Flakenstein for a snazzy and internet-expanding implementation of that breakthrough.
posted by XMLicious at 5:11 PM on November 9, 2006

That 1% of graphics geeks that use Windows should truly appreciate this.
Speaking from the other side, I'm intensely interested. I wish I could play with this, it has the (apparent) potential to be great. Dear Bill: Don't bogart the technology!
posted by lekvar at 5:14 PM on November 9, 2006

As interested as I am in looking at this and playing with it, I'm not going to update IE to beta test your project for you.

I thought it only needed IE 6? Isn't that years old?
posted by smackfu at 5:45 PM on November 9, 2006

Correct. IE 6 should be fine.
posted by dr.flakenstein at 6:06 PM on November 9, 2006

Holy crap. I really should take crackpotted potshots and make erroneous callouts directed at ginormous corporations more often.

Thanks for your response and candor, dr.flakenstein. That you even bothered to here reply speaks volumes about how much you personally care about the project. I've watched the videos, and I think it's a damn nifty project - nifty enough for me to at least attempt to be agnostic about it and overlook that it's from Microsoft. (And for me, that's saying something - something to take to your marketing department!)

However, I'd be happy to try an agnostic, standalone installer. But don't please don't ask me upgrade to XP. If it works in XP it should be able to work in 2k.

smakfu wrote: I thought it only needed IE 6? Isn't that years old?

Yup. :) I abandoned IE at around 5.5. It's been so long that I've launched IE that I can't even remember if I have 5 or 5.5 on this machine. I don't even care to go find out. It's been locked away and software-firewalled off in a little black box for years. Occasionally I give it the hose and poke it with a stick to torment it a little before locking it up again.

creeptick wrote:
I used FF with IEtab extension and it worked.

I could be wrong, but AFAIR IEtab simply uses your existing and installed version of the IE code inside of Firefox. If so, it's not a standalone replacement for IE.

That 1% of graphics geeks that use Windows should truly appreciate this.

Hey now!
posted by loquacious at 6:13 PM on November 9, 2006

Correction: It's been so long that since I've launched
posted by loquacious at 6:14 PM on November 9, 2006

This is freakin' awesome. I can wait until I can use this to play with my own photos. Another 3 months, maybe? Please? Pretty-please?
posted by Bort at 6:27 PM on November 9, 2006

XP SP2 or Vista only :-(
posted by sineater at 7:09 PM on November 9, 2006

I don't mean to belittle the project, the effort put into the project, or the rather surprising appearance of the lab's management discussing the project, but something must be said:

"which, BTW, would still have to be Windows only because of the [low-]level graphics code"

Gary, one of the reasons OpenGL exists is to render this sort of statement untrue for nearly all cases. Of course that doesn't address your other -- seemingly mostly sound -- design decisions, but this assumption appears to me to be bogus. OpenGL even works reasonably well under Windows!

I'll only cursorily mention that running 32 bit client code via multiple browsers is a mostly solved problem (though not without some implementation quirks): Java bytecode runs in the browser fairly well on most modern platforms.

I appreciate that this is a Microsoft project, and that Microsoft technology is likely to be at the heart of it, but the implicit assumptions driving this kind of platform-locked implementation are excellent examples of why Microsoft is not looked upon favorably by those outside its groupthink sphere.
posted by majick at 7:14 PM on November 9, 2006

Microsoft Research operates as an open academic environment without a short-term commercial focus - they have partnerships with university labs and publish a lot of cutting-edge research. Live Labs, an offshoot, is nominally more product-oriented, but still shouldn't be conflated with the general M$ malaise.
posted by arialblack at 8:25 PM on November 9, 2006

You're suggesting he find a java implementation of the particular set of computer-vision-required algorithms that is even the same order of magnitude performant as some established, pointer-heavy implementation written in 32-bit assembler and C (or Fortran in some cases), then make those work comfortably in a 3D application where he must either handle the frame-by-frame vertex management (performant but very time-consuming and not really a solved problem with regard to balancing accuracy with framerate) or letting someone else do it for you (DML or retained mode Direct3D, goodbye some percentage of performance in exchange for convenience, no OpenGL solution for that of which I'm aware) and then remember that you set out to write an application to allow an end-user to perform awesome computer vision tasks and ended up writing (another) halfway-there 3D and algorithm framework that took three times as long as your original schedule just to do something you could have delivered in 32-bit code by picking a platform and solving a user-problem instead of a computer-problem?

I can see why these folks made the choices they did.

(Neat program! I look forward to being able to try it out with all my travel photos soon.)
posted by abulafa at 8:38 PM on November 9, 2006

What abulafa said. Thanks.
posted by dr.flakenstein at 9:30 PM on November 9, 2006

dr.flakenstein: Would you mind elaborating on what aspects make this development a machine vision breakthrough? I agree that it is an impressive application, but sparse local feature matching and 3D scene reconstruction have been around for some time. Is it the fact that these techniques have been applied to large image datasets without camera calibration? I assume that you must be doing a lot of complex things to handle less than ideal data, but does Photosynth reflect a refinement of present image stitching/3D reconstruction techniques or some fundamentally new methods?
posted by tss at 12:10 AM on November 10, 2006

Quick Tip! Photosynth loves to be zoomed...
posted by BoatMeme at 2:01 AM on November 10, 2006

tss: the best answer is for me to point you to the original paper that goes into the techniques in some depth. Stitching, by itself, is not that big a deal. What's special about this work is its robustness on large datasets from wildy different sources. We've registered large photo collections that are taken years apart, are at different resolutions, different times of day, different color spectrums, etc. Yet the algorithms are able to ignore the variations that, well, vary, and focus only on the invariant features that persist over all of those conditions. That's the essense of the breakthrough: it's robustness is simply stunning.

BoatMeme: In the Gary Faigin collection, zoom in on the close ups of his art. Those are as large as 80 megapixels. Yes, I said 80 Mp. That's the seadragon backend that is making it possible for you to view that over a modest network connection.

You should think of Photosynth as the marriage of those two new technologies: the robust spatial registration of many images with a client/server combination for streaming vast images extremely efficiently.
posted by dr.flakenstein at 7:25 AM on November 10, 2006

Thanks for posting the link to the paper, I guess I had bad luck in the course of trying to find it on my own.

So, is the "pointcloud" 3-D model that the user has in Photosynth to select and transition between different images - is that derived from the collection of "SIFT keypoints" that's described in the paper?

For others looking into this, here's a tool that just does the keypoint detection.
posted by XMLicious at 8:29 AM on November 10, 2006

dr.flakenstein: Thanks for the note.

Frank Dellaert at Georgia Tech has similar research goals, and I first heard of Photosynth from a talk he gave at here CMU about two weeks ago.

The collection of images from Spirit or Opportunity would make another fascinating dataset for this application.
posted by tss at 11:16 AM on November 10, 2006

tss: stay tuned. We are currently crunching something that (based on your last line) I am confident that you'll like. It may not be released for a month or so, but grab the RSS feed from our main site and I'll make sure we announce it properly when it is out.
posted by dr.flakenstein at 9:18 AM on November 11, 2006

"I don't buy it. I'm not of the "this is just QTVR" position contended in the previous thread, but there's no way they've got some kind of system that will generate a 3D model, unassisted, from a random collection of photographs. This has to require a whole bunch of massaging to achieve the final output."

This sort of technology is 1980s grade defense sector tech. Used for aerial recon and synthetic aperture radar for 20+ years already.
posted by Sukiari at 1:33 AM on November 12, 2006

Building a 3D model from radar versus building a 3D model from photographs? Oh yeah, that's totally the same thing. Not.

Of course you can build a 3D model from radar, that's what the information coming out of a radar system is - measurements of the distance between the antenna and reflective surfaces.

Photographs, on the other hand, don't have any distance or orientation information in them. The algorithms in question are inferring distance and orientation out of the photos, which is something that has been enough of a problem that we don't have self-guided robots or computer systems that can drive cars without crashing into things.
posted by XMLicious at 6:07 PM on November 12, 2006

« Older Steve McQueen, Steve McQueen, The coolest doggone ...  |  The Health Care Crisis and Wha... Newer »

This thread has been archived and is closed to new comments