Happy Canada Day! Let’s celebrate with yet another Windows Azure developer story!
A few weeks ago, I started my search for untold Canadian stories in preparation for my talk, Windows Azure: What’s In the Cloud, at Prairie Dev Con. I was just looking for a few stores, but was actually surprised, impressed, and proud of my fellow Canadians when I was able to connect with several Canadian developers who have either built new applications using Windows Azure services or have migrated existing applications to Windows Azure. What was really amazing to see was the different ways these Canadian developers were Windows Azure to create unique solutions.
This is one of those stories.
Leveraging Windows Azure for Applications That Scale and Store Data
Back in May, we talked about leveraging Windows Azure for your next app idea, and specifically, usage scenarios around websites. We talked about how Windows Azure is ideal for sites that have to scale quickly and for sites that store massive amounts of data. Today, we’ll chat with Morton Rand-Hendriksen (@mor10) and Chris Arnold (@GoodCoffeeCode) from PhotoPivot.com and deep dive into the intricate ways they’ve used Windows Azure as a backend processing engine and mass storage platform for PhotoPivot.
PhotoPivot is an early stage, self-funded start-up with the potential for internet-scale growth as a value-add to existing photo platforms by adding a DeepZoom layer to peoples' entire image collections. This, coupled with its unique front-ends, creates a great user experience. PhotoPivot experiences huge, sporadic processing burden to create this new layer and is constantly in need of vast amounts of storage.
Jonathan: When you guys were designing PhotoPivot, what was the rationale behind your decision to develop for the Cloud, and more specifically, to use Windows Azure?
Morten: Cloud gives us a cost-effective, zero-maintenance, highly scalable approach to hosting. It enables us to spend our valuable time focusing on our customers, not our infrastructure. Azure was the obvious choice. Between Chris and I, we’ve developed on the Microsoft stack for 2 decades and Azure's integration into our familiar IDE was important. As a BizSpark member, we also get some great, free benefits. This enabled us to get moving fast without too much concern over costs.
Chris: I like integrated solutions. It means that if (when?) something in the stack I'm using goes wrong I normally have one point of contact for a fix. Using something like AWS would, potentially, put us in a position of bouncing emails back and forth between Amazon and Microsoft - not very appealing. I've also been a .NET developer since it was in Beta so using a Windows-based platform was the obvious choice.
Jonathan: What Windows Azure services are you using? How are you using them?
Chris: We use Windows Azure, SQL Azure, Blob Storage and CDN. Currently our infrastructure consists of an ASP.NET MVC front-end hosted in Extra Small web roles (Windows Azure Compute). We will soon be porting this to WordPress hosted on Azure. We also have a back-end process that is hosted in worker roles. These are only turned on, sporadically, when we need to process new users to the platform (and subsequently turned off when no longer needed so as to not incur costs). If we have a number of pending users we have the option to spin up as many roles as we want to pay for in order to speed up the work. We are planning to make good use of the off-peak times to spin these up - thus saving us money on data transfers in.
We use SQL Azure to store all the non-binary, relational data for our users. This is potentially large (due to all the Exif data etc. associated with photography) but, thankfully, it can be massively normalised. We use Entity Framework as our logical data layer and, from this, we automatically generated the database.
We use Blob storage for all of the DeepZoom binary and xml data structures. Public photos are put in public containers and can be browsed directly whilst non-public photos are stored in private containers and accessed via a web role that handles authentication and authorization.
One 'interesting' aspect to this is the way we generate the DeepZoom data. The Microsoft tools are still very wedded to the filing system. This has meant us using local storage as a staging platform. Once generated, the output is uploaded to the appropriate container. We are working on writing our own DeepZoom tools that will enable us to target any Stream, not just the filing system.
Our existing data centre was in the US. Because our Silverlight front-end does a lot of async streaming, users in the UK, noticed the 100ms lag. Using the CDN gives us a trivially simple way to distribute our image data and give our worldwide users a great experience.
Jonathan: During development, did you run into anything that was not obvious and required you to do some research? What were your findings? Hopefully, other developers will be able to use your findings to solve similar issues.
Chris: When designing something as complex as PhotoPivot, you’re bound to run into a few things:
- Table storage seemed the obvious choice for all our non-binary data. Using a NoSQL approach removes a layer from your stack and simplifies your application. Azure's table storage has always been touted as a fantastically cheap way to store de-normalised data. And, it is - as long as you don't need to access it frequently. We eventually changed to SQL Azure. This was, firstly, for the ability to query better and, secondly, because there's no per-transaction cost. BTW - setting up SQL Azure was blissfully simple - I never want to go back to manually setting database file locations etc!
- There's no debugging for deployment issues without IntelliTrace. This is OK for us as we have MSDN Ultimate through BizSpark. If you only have MSDN Professional, though, you won’t have this feature.
- Tracing and debugging are critical. We wrote new TraceListeners to handle Azure's scale-out abilities. Our existing back-end, pending user process, was already set up to use the standard Trace subsystems built into .NET. This easily allows us to add TraceListeners to dump info into files or to the console. There are techniques for doing this with local storage and then, periodically, shipping them to blob storage but I didn't like the approach. So, I created another Entity Data Model for the logging entities and used that to auto-generate another database. I then extended the base TraceListener class and created one that accepted the correct ObjectContext as a constructor and took care of persisting the trace events. Because the connection strings are stored in the config files this also gives us the ability to use multiple databases and infinitely scale out if required.
- The local emulators are pretty good, but depending on what you’re doing, there’s no guarantee that your code will work as expected in the Cloud. This can definitely slow up the development process.
- Best practice says to never use direct links to resources because it introduces the 'Insecure Direct Object Reference' vulnerability. In order to do avoid this, though, we would have to pay for more compute instances. Setting our blob containers to 'public' was cheaper and no security risk as they are isolated storage.
Jonathan: Lastly, what were some of the lessons you and your team learned as part of ramping up to use Windows Azure or actually developing for Windows Azure?
Chris: Efficiency is everything. When you move from a dedicated server to Azure you have to make your storage and processes as efficient as possible, because they directly effect your bottom line. We spent time refactoring our code to 'max out' both CPU and bandwidth simultaneously. Azure can be a route to creating a profitable service, but you have to work harder to achieve this.
How did we do it? Our existing back-end process (that, basically, imports new users) ran on a dedicated server. Using 'Lean Startup' principles I wrote code in a manner that allowed me to test ideas quickly. This meant that it wasn't as efficient or robust as production code. This was OK because we were paying a flat-rate for our server. Azure's pay-as-you-go model means that, if we can successfully refactor existing code so that it runs twice as fast, we'll save money.
Our existing process had 2 sequential steps:
- Download ALL the data for a user from Flickr.
- Process the data and create DeepZoom collections.
During step 1 we used as much bandwidth as possible but NO CPU cycles. During step 2, we didn't use ANY bandwidth but lots of CPU cycles. By changing our process flow, we were able to utilise both bandwidth and CPU cycles simultaneously and get through the process quicker. For example:
- Download data for ONE photo from Flickr.
- Process that ONE photo and create DeepZoom images.
- Goto 1.
Another HUGELY important aspect is concurrency. Fully utilising the many classes in the TPL (Task Parallel Library) is hard, but necessary if you are going to develop successfully on Azure (or any pay-as-you-go platform). Gone are the days of writing code in series
Thank you Chris and Morten. I’d like to take this opportunity to thank you for taking us on this deep dive exploring the inner workings of PhotoPivot.
In Moving Your Solution to the Cloud, we talked about two types of applications – compatible with Windows Azure and designed for Windows Azure. You can consider the original dedicated server hosted version of PhotoPivot as compatible with Windows Azure. Would it work in Windows Azure if it were deployed as is? Yes, absolutely. However, as you can see above, in order to really reap the benefits of Windows Azure, Chris had to make a few changes to the application. However, once done, PhotoPivot became an application that was designed for Windows Azure, and leveraging the platform to its max to reduce costs and maximize on scale.
If you’re a Flickr user, head over to photopivot.com and sign up to participate in the beta program. Once you see your pictures in these new dimensions, you’ll never want to look at them in any other way. From the photography aficionados to the average point-and-shooter, this is a great visualization tool that will give you a new way of exploring your picture collections. Check it out.
Join The Conversation
What do you think of this solution’s use of Windows Azure? Has this story helped you better understand usage scenarios for Windows Azure? Join the Ignite Your Coding LinkedIn discussion to share your thoughts.
Missed previous developer stories in the series? Check them out here.