Building Applications for the Cloud

I am at the Azure Firestarter event in Redmond today and just heard David Aiken from the Azure team give an overview of best practices for developing applications for the cloud along with some tips and tricks.  Here are my notes; slides and sample code are to be posted later and I will update the post with them when they are.

David Aiken - Applications for the Cloud

  • We can migrate an existing app and many will migrate without that much work.
  • However, let's focus on what new things we can use if we build an app for the cloud.
  • Optimal workload patterns for the cloud
    • "On and Off" workload - e.g. my app is only used between 7 AM and 7 PM. Can I turn it off?
      • Yes - add instances during the day and turn them off at night.
    • Growing Fast
      • Startups. Turn the dial up as get more users.
      • Don't need a huge upfront investment for what you guess your peak load might be.
      • Priced out a high-end HP server last night: 48-way CPU, 15000 rpm drives, 512GB of RAM = $144K
      • Facebook apps
        • Have 200 users week 1
        • 2000 week 2
        • 200,000 week 3
        • 20 million week 4
      • If you don't think about this upfront and design for it - you'll be in a world of hurt between week 3 and week 4
      • Cost scales linearly with usage
    • Unpredictable bursting
      • "My site is being featured on Oprah on Wednesday - they told me expect 20 million additional users".
    • Predictable bursting
      • Could be seasonal, monthly, quarterly, …
  • When you don't have to buy for the peak and pay for it when you don't need it, you can actually oversupply the peak demand to provide even better performance
    • If you don't have to pay for 12 instances when you usually only need 1, can you afford to buy 20 instances when you need 12 and provide that much better performance?
  • How do I get started?
    • Think about how to manage subscriptions for your dev team - or you'll have everyone sharing one Live ID
    • Download the training kit
    • Tools for looking at storage (www.cerebrata.com has some; others at….)
    • Diagnostics tools (www.cerebrata.com has a tool for looking at diagnostics data; others at …)
  • Billing and Subscription
    • Billing Account is one WLID. Subscription can be the same WLID or a different WLID. In general, should be different if not a one-person shop.
    • Finance guy should have the billing WLID -- not developers.
    • He creates multiple subscriptions.
      • One per developer (each dev has own WLID)
    • But how do they deploy to the same app if they are working on the same app?
      • Create certificate for the app. Create a self-signed certificate - directions here.
      • Give to developers
      • They deploy using certificate. Can use PowerShell scripts using the Azure Cmdlets to deploy (details on Channel 9).
  • When you're testing apps in the cloud - you can use the real cloud. Don't have to build a virtual data center in a single machine in your office.
  • Q: Can you delete a subscription?
    • A: Call the helpdesk to delete a subscription - they'll do it.
    • A: You can write a script that enumerates every service you have running under a certificate and removes them - to avoid paying for unneeded cloud instances.
  • Remember that the data centers all run UTC time.
  • Simple Azure App - Archivist - builds an archive of Twitter feeds.
    • Used SQL Azure for store because very easy to write a WHERE clause against DateTime - would be harder against Table storage.
    • For each search term… query Twitter… Convert to JSON for ease of use in a Silverlight client.
    • Do some aggregation.
    • Repeat about 200,000 times per hour.
    • Key: Decouple the pieces. Assume failure at any point.
      • "Made a guarantee that you will read every message on a queue at least once - and we literally mean at least once - good chance you will read it more than that."
      • Send each search term to query twitter for into a queue.
      • Query twitter piece gets each search term, queries twitter and puts results in a blob - puts that into a queue.
      • Next piece pulls the blobs off the queue and aggregates results.
      • Can scale independently
        • Piece that pulls search terms and puts into queue - probably one instance.
        • Piece that queries Twitter - might want 2-3 instances.
        • Piece that aggregates - maybe 10 or so instances.
        • By dividing up work so granularly, can do this.
      • Look at queue length to determine how many instances you need - turn off and on as needed.
        • If increasing number of instances, might also want to create number of queues - nothing fancy, just round robin.
        • Can do all this in the cloud - so app becomes almost self-monitoring and self-adjusting.
        • Another approach is to combine 5-10 search terms into a single queue entry - might reduce overhead of just pulling a message off the queue.
      • Might make sense to do some pre-processing to enable common queries - to show top 10 contributors on Twitter to a search term, read in whole feed into memory, figure out the top 10 in memory and write out a blob pre-formatted in JSON or HTML with the answer - then on the display, don't do a query - just show the content.
    • Could use Command pattern to decouple instances from work - so implement in threads on a single instance in testing but implement in role instances in production.
  • Cloudy Tips
    • Remember you are 64-bit in the cloud. Make sure your 3rd party DLLs are 64-bit. Leave your setting at AnyCPU.
    • Ship everything to the Cloud for testing.
      • If see app doing "Initializing… Busy… Initializing… Busy" - probably missing a reference you have on Dev Fabric but not on the cloud.
      • No harm in saying "Copy Local" on any dependencies and references, even those in the GAC.
      • If you're using PHP - include the PHP engine in the deployment.
    • What does stateless mean anyway?
      • Don't have state be in your compute instance - it's somewhere else.
      • It could be in table storage, in SQL, in blob storage, whatever.
      • Can use the Membership and Profile providers to persist this into Table storage.
      • If you have state in your compute - as soon as that compute instance dies, the state is gone.
    • No Azure SLA if you don't have at least 2 instances.
      • If you have just one instance, you can go down with the hardware.
      • More than one instance - will guarantee those on separate scale units.
  • SQL Azure vs. Tables
    • How do I choose between them?
    • SQL is the easiest - but what if I need 60 GB? But Table storage sounds really limited on queries?
    • How do I find the top 10 items in a Table store?
    • Have a sample app - BidNow. It's out on Code Gallery.
      • Online auction site.
      • Have bids ending soon, most viewed items, hot items (most bids)
      • Can imagine the SQL queries to do this… but don't use SQL Azure in BidNow.
      • There is a separate Azure Table for each UI component - Hot Items, Most Viewed, Bids Ending Soon.
      • How do you make a SQL query fast? You create an index on the items you're querying. What SQL does is when you update or add an item, it updates the index as well.
        • But don't have indices on table storage except for PartitionKey and RowKey.
        • Can use that to create a new item with the right partition key so that if we just read the first 10 items from ItemsMostViewed - the order in which they will come out based on partition key is right.
        • So when an item is viewed, add a message to a queue and have a worker role process those and update the Most Viewed Items table.
      • When appropriate - use both.
        • If your database is really big because you have binary data in there - pull it out and put it in blob storage.
        • Don't limit yourself by "I have to use SQL Server."
        • Table storage is a lot cheaper.
        • It's flexible. In your profile can store arbitrary data:
          • Can have n rows for each user. First is generic user info (email, name, etc.)
          • Next set of rows has the items you've bidded on.
          • Next set of rows has the items you've listed.
          • All share same partition key - user name - but rowkey can be different.
        • Denormalize and partition.
          • Don't think I have a 32-way box running my database - think I've got 32 1-way boxes running my DB.
        • Think about access patterns - you're charged for both accesses and storage. You might end up spending more on access charges than on storage charges.
  • Failure
    • 100% guaranteed. 99.99% uptime -> run long enough and eventually you'll be down.
    • At least will happen when need to patch the machine your instance is running on.
    • You'll get notified by this - get a bit of time to run shutdown code.
    • It doesn't matter how many times we do the same task:
      • If it fails at the beginning…
      • If it fails at the end…
      • If it fails in the middle…
      • Or we do it 300 times in a row
    • You get the same answer!
  • If your service is on fire - could you figure it out?
    • Use Diagnostics
    • Could have an XML file that specifies what logs and Perf Counters you want.
      • Read it and adjust the diagnostics settings to do the right thing.
    • In the App Fabric SDK there is a real-time tracing tool using the Service Bus.
      • You can have a trace listener that just puts its messages on the service bus
      • Can look at it on any machine.
  • Collecting log data also helps with spotting patterns of usage that would affect your capacity planning.
  • Q: If your application fails over and over - how long will the fabric controller keep starting it?
    • A: I've never seen it stop it. It will keep recycling.
  • Q: If I want to run 10 instances at day and no instances at night, will that happen automatically or do I have to script it?
    • A: You have to script that or write the code.