Photosynth

Give it a Whirl

By now you may have already heard of and played with Photosynth, a new Technology Preview from Microsoft Live Labs.  If you haven't had a chance to try out Photosynth for yourself, trust me, you'll be glad you spent the time to check it out.  So, rather than me give a long-winded explanation of what it is and what it does, go experience the Photosynth Technology Preview yourself right now. 

Spend some time exploring. Review the "Getting Started" and other useful information, including all the cool keyboard shortcuts.  Check out different photo collections.  When you're in Gary Fagan's studio, be sure to explore the pictures on the wall.  Try out all the buttons in the upper-right corner of the screen.  And for goodness sake, be sure to exercise that wheel on your mouse and zoom, zoom, zoom. 

When you can finally tear yourself away from all that fun, come on back here and we'll talk a little bit about what's under the covers and how it relates to HD Photo.  See you soon!  You can find the Photosynth Technology Preview at https://labs.live.com/photosynth

I Recognize that Pixel!

Welcome back.  Pretty cool, right?  The Photosynth Technology Preview is made up of multiple technologies, all working together to create a unique experience.  While this combination is pretty astounding, each of the individual technologies has the potential to be applied in many different applications.

One of the amazing capabilities in the Photosynth Technology Preview has actually happened  before you even start viewing photos.  Photosynth accepts a collection of individual photos, all shot in the same spatial location, from arbitrary angles, at arbitrary lens settings, with arbitrary lighting, and quite possibly at very different times.  The only requirement is that each photo overlaps one or more of the other photos so that all photos in the collection are connected through some combination of overlaps.

Photosynth analyzes all the photos and finds various points that are common among one or more photos.  It's a little beyond the scope of this blog to explain exactly how all that happens. (Translation: I don't have a clue.)  But the task of figuring out exactly where each photo is projected in a 3D environment is (for the most part) a completely automatic process.  Once a collection is stated, new photos can be added later.  The "cloud of points" you see in the background of all the photos is the set of individual points that were identified to exist in one or more photos.

In addition to calculating the placement of the photo, Photosynth also calculates the position of the camera when the photo was taken.  The camera button in the upper right turns on and off the orange pyramids that represent these camera locations.  In addition to clicking on a white rectangle to display a specific photo, you can also click on these orange pyramids to view the photo taken from that particular vantage point.

Browsing GigaPixels of Photos

So if you spent some time playing with Photosynth, you've certainly discovered that you can smoothly and seamlessly zoom and pan from one photo to the next.  If you select the 2D view, you can instantly view every photo in the collection, and quickly zoom in to finest pixel detail on any individual photo in the collection.  If you didn't try zooming and panning in the 2D view, go back and try it now.  Then stop and think about exactly how many gigapixels worth of digital photos that you're free to interactively explore, all over an Internet connection, and all with virtually no delays or pauses.

One thing you may have noticed is that when you quickly zoom in to a photo, or quickly pan to a neighboring photo when zoomed in, the image first appears soft and out of focus, then quickly comes into sharp focus.  As you continue to zoom into the finest detail, this process of adding the sharp detail to a softer out-of-focus preview continues.  If you look very closely, you may even notice that this happens in segments of the photo, starting near the center of the screen, then moving to the out edges.  If you have a less-than ideal Internet connection, this process will most likely be slowed down enough so you can clearly see the various stages.  (But even with a slow, clunky Internet connection, the user interface is always smooth and responsive.)

So as you experiment with zooming and panning around a photo collection in Photosynth (especially in 2D mode), you can begin to see the clues as to what is happening to provide this near-magical experience.

Uncovering the SeaDragon

The technology component in Photosynth that provides this amazingly smooth and continuous exploration of arbitrarily large high resolution photo collections is an incubation project at Microsoft Live Labs, code-name SeaDragon.  The name comes from the name of a start-up company that joined the Microsoft family earlier this year, becoming part of Microsoft Live Labs

The SeaDragon technology makes it possible to smoothly and quickly view any number of web images with a responsive, intuitive user experience.  It doesn't matter how large the images are - each image could be gigapixels in size.  Nor does it really matter how many images are in the collection.  The SeaDragon technology handles it all.

The key to this magic is to only transmit the minimum subset of information that is required for any particular view.  As the view changes (either zooming or panning), the minimum information for the new view is transmitted and the SeaDragon user interface (powered by Microsoft DirectX) smoothly crossfades from one view to the next.  Because your computer screen is a finite size, there is always a finite limit on the amount of information that is needed at any point in time to fill that screen.  It doesn't matter how large the individual images are or how many of them are in the collection; the amount of information required to fill the screen is always the same. (Read that last sentence again; when that sinks in and make sense, you'll understand the key to how this all works.)

But there's a catch.  (Isn't there always?)  For this to work, not only do you need to figure out exactly what the minimum required information is, you also need to be able to put your hands on just that information as quickly as possible.  That's exactly what the SeaDragon technology can do, with a little help from HD Photo.  (I told you this would come back home eventually.)

Mipmaps and Regions

Every photo you see in the Photosynth Technology Preview is an HD Photo image.  The photos that make up each collection are stored in their full resolution in HD Photo format on the server.  The tiling feature of HD Photo has been used to encode the photos so the data is internally organized within the compressed HD Photo file into bite-sized tiles.

The HD Photo codec provides built-in features to decode images at reduced resolutions (often called mipmaps).  So when Photosynth only needs a small thumbnail of a high resolution image, The HD Photo codec running on the server can extract the subset of data needed for this low resolution display (a small fraction of the entire file) and using a compressed domain transformation operation, create a new HD Photo file that only contains that low resolution information. 

Because this is a compressed domain operation, the server never had to decode or re-encode the compressed data to create this low resolution "thumbnail" of the larger, high resolution image.  The only work involved was to copy a portion of the compressed data and wrap it up in a container to make a new HD Photo file.  This very small HD Photo file is sent across the network connection, and then decoded by the HD Photo codec on the client to provide the low resolution view required for the particular display.

When zooming in to the fine details of a high resolution image, the HD Photo codec is able to very quickly extract an arbitrary rectangular region by accessing only the image tiles that overlap that region.  Like the mipmaps described above, this is accomplished by simply extracting a small portion of the compressed data and building a new (and very small) HD Photo file to be sent across the network.  The client receives and decodes this small file, combining it with the other segments required to display the required view.

This combination of mipmaps and regions allows the server to quickly extract the subset of compressed data required for any particular view and transmit it to the client with no need to ever uncompress the information until the client displays it on the screen.

HD Photo provides the key features built into the standard encoder and decoder to enable this powerful client/server solution.  The high performance decoder, enhanced mipmap and region decoding capabilities combined with its excellent compression efficiency provides a key component of the technology magic that powers the Photosynth Technology Preview.

Full disclosure: The preceding section describes the design of the SeaDragon technology. The current Photosynth Technology Preview uses HD Photo for all images, but is temporarily making use of some pre-cached mipmaps and regions to stream-line the development process. Moving forward, Photosynth will take full advantage of the features of the HD Photo codec to dynamically extract this information on demand, eliminating the extra overhead of pre-caching.

Beyond Piazza San Marco

The Photosynth Technology Preview is just a glimpse of what is to come.  The Photosynth team is hard at work to make it possible for you to create your own collections from your own photos.  Then imagine being able to not only explore your own photos, but to combine your photos with a large online community.  Photosynth will make it possible to merge photo collections from any number of people, and dramatically expand the opportunities and the excitement of the online photo sharing experience.

Photosynth is just one example of how the SeaDragon technology can revolutionize the way we view information online.  Instead of photos, imagine if all those images that you can smoothly and freely explore were pages of a book, magazine, or newspaper, or a collection of web pages, or a series of x-rays or MRI scans, or hundreds of high resolution blue prints.  The SeaDragon technology and HD Photo will empower entirely new ways to explore online information, changing the paradigm of how we explore the Internet.

Stay tuned, it's just starting to get fun!