Azure CDN: overview and dynamic content

Article
03/13/2017

What is the Azure Content Delivery Network (CDN)?

The Azure CDN offers a global solution for delivering content to web browsers with minimal latency by caching publicly available assets such as image, CSS, or script files at geo-dispersed edge nodes. Web pages refer to assets (such as image files) by their CDN URLs.

Referring to a relative URL (for example, <img src="/images/logo.png" />) means the web browser has to download the asset from the source web site. This may not be optimal if the web site is hosted far away from the browser, or if the web server is heavily loaded.

Instead, using a CDN lets asset files be downloaded from edge nodes (dispersed web servers that store copies of the asset files), which both offloads the source web site and shortens the time for web browsers to download files, as there will likely be an edge node closer to the web browser's location than the source web server. This means page load times are reduced, resulting in a more responsive user experience.

Getting content to the CDN

How does the CDN get the assets that it distributes to its edge nodes? When setting up a CDN, an endpoint is defined, and an origin must be specified for the endpoint. The endpoint defines the hostname that will be used to refer to assets, and the origin defines the asset source.

For example: let's say I define an endpoint called mycdnendpoint. The hostname for asset retrieval would then be mycdnendpoint.azureedge.net (the azureedge.net top-level domain is fixed for Azure CDN).

Now, I need to define an origin from which the CDN will get assets for the endpoint. In Azure CDN, I can choose from these origin types: Azure blob storage, an Azure cloud service, an Azure web app (this lets me use Azure services exposed as web apps, such as API apps or Azure functions), or any HTTP/HTTPS URL for maximum flexibility.

Choosing Azure blob storage as an origin means it's very simple to expose assets over the CDN: simply load all the assets into the chosen blob storage account (folder structures are fully supported), append the path to the endpoint hostname, and construct a URL.

For example: I have an image called microsoft.png in an assets container (root folder) of the Azure storage account at which I pointed a CDN origin. My CDN URL for this image would be https://mycdnendpoint.azureedge.net/assets/microsoft.png. When any web browser loads a page that includes this image URL, the browser will contact the Azure CDN to download that file, and the CDN will redirect my browser to the CDN edge node that is closest to me to download the file.

Here's a conceptual diagram showing CDN elements, including content sources, CDN origins and endpoints, and public-facing web sites that include CDN-served content.

[caption id="attachment_165" align="alignleft" width="814"] Conceptual diagram showing CDN origins, endpoints, and consumption. [/caption]

I'm leaving out a few details here, such as CDN support for custom domains (i.e. use your own top-level domain name instead of azureedge.net), choosing either or both HTTP and HTTPS, using only a subset of a storage account for CDN assets, using host headers, caching on query string combinations, and more. These beyond-basic capabilities will likely be part of any production-class CDN rollout. Please refer to the Azure CDN documentation for more details.

Cache Misses and Hits

Let's say I just updated my web site with some new pages. These refer to new image files that I just uploaded to my origin blob storage. How do these images get out to all the edge nodes so that visitors to my site see them?

CDNs typically do not pre-load asset files to all the edge nodes. This would require a prohibitive amount of network traffic and edge node storage (after all, each edge node would have to cache every single file in every origin).

Instead, when an asset file is requested from an edge node for the first time, the edge node will check its own file cache. If the asset file isn't there - known as a "cache miss" - the edge node will request the file from the origin, cache it locally, and then return it to the web browser. This means that the first person to request a file will incur a delay similar to requesting it from the source; however, the edge node now has the file, and will keep it cached locally for later visitors, who will get the asset directly from the edge node without going back to the origin, known as a "cache hit".

Time To Live (TTL)

How long does the edge node keep the file cached? Typically, a file has a time-to-live (TTL). If a file is not requested by anyone during that TTL, the edge node deletes the file from its cache (but not from the origin, of course). This helps manage edge node resources. TTL is customizable by several methods; if none is set on an asset, Azure CDN defaults to a TTL of seven days.

The appropriate TTL will depend on factors specific to your scenario. For example, your organization's logo presumably does not change very often, but gets downloaded very often as it's on every page of your web site; therefore, consider a longer TTL so that edge nodes don't retrieve this file from the origin more often than is needed.

By contrast, files that change periodically should get shorter TTLs so that your users don't see outdated files (arguably, deploying updated asset files with new file names is a surer way of ensuring your site visitors see the freshest content, but this may not always be practical). There are further considerations to setting the "right" TTL, including balancing network traffic between edge nodes and the origin vs. content freshness; the Azure CDN includes a number of reports (such as cache hit ratio) that can help you tune your TTLs.

[caption id="attachment_76" align="alignnone" width="725"] Azure CDN Cache Hit Ratio metric[/caption]

Timeline

What does this look like in reality? The following screenshot shows repeated loads of an image from an Azure CDN endpoint in my web browser's developer tools network timeline.

The first download, labeled as "First load (CDN cache miss)", shows a total time of 650ms. This is because the edge node has to retrieve the file from the origin, which takes longer.

Later loads, labeled as "Later loads (CDN cache hits)", have much shorter total times, between 16 and 49ms. This indicates that the CDN edge node now has the file in its cache and doesn't need to go back to the origin to get it. (Note on the screenshot that I have disabled my browser's local cache, so that the CDN's cache effect is obvious.)

[caption id="attachment_75" align="alignnone" width="776"] Asset first and later loads from CDN edge[/caption]

New content to CDN immediately

So far, we have discussed straightforward CDN provisioning from an origin with a relatively stable set of asset files, presumably deployed when a site updates or on another relatively predictable interval - possibly even a continuous deployment process in an ALM solution like Visual Studio Team Services.

But what about scenarios when content is generated dynamically, for example in a browser-based image editing tool, and that content needs to be immediately available through a CDN?

Clearly, waiting for a manual upload or publish process, or a periodic continuous deployment process, would not be suitable due to the delay between asset generation and CDN origin availability. In such a scenario, we need to get the newly created asset file into the CDN origin right away, ideally as soon as it is created.

File-based content

Azure blob storage is ideal as an origin for static files. When a file is generated by some sort of process and needs to be CDN-available right away, we can write the file to our Azure storage programmatically as soon as it's created. There are Azure SDKs in various languages and stacks.

Uploading a file to Azure blob storage and setting its TTL is straightforward using the Azure storage SDK. The following screenshot shows the pertinent code; I'm using C# but again, there are SDKs in various languages. Please see the Resources section at the end of this post for a link to the full code base.

On lines 142-143 I'm creating a blob reference for the file I'll upload and setting its content type (this is a MIME type designation; for an image file like a .png or .jpg, the content type would be image/png). Setting the content type here ensures that it propagates to the edge nodes and lets them serve the content up correctly to web browsers.

On line 146, I'm setting a retry policy. This is optional, but good practice; it instructs the Azure blob storage client I'm using to upload my file what to do in case the upload is interrupted. Here, I'm telling it to wait five seconds, then retry up to a max of five times. Guarding against transient failures is a highly recommended best practice in cloud application development.

On line 149, I am asynchronously uploading the file to my blob storage account.

On line 152, I set the blob's CacheControl property; this is what CDN edge nodes will use for the file's TTL in their cache. The TTL value is an integer and is measured in seconds.

On line 155, I'm updating the file's properties with the TTL. Then, on line 157 I get its URI in Azure blob storage - this is not a CDN URI yet, but I'm retrieving it so I can first check that it uploaded successfully to blob storage.

[caption id="attachment_95" align="alignleft" width="902"] C#: upload file to blob storage , get URI. [/caption]

The blob storage location to which I uploaded this file is the origin for my CDN endpoint, so at this point the file should be available not only through its direct storage URI (which I retrieved on line 157, above) but also through CDN URLs. To confirm this, I use an IsBrowsable() method to check that the file I uploaded is available directly from its Azure storage URI as well as through my CDN. Note that I confirm both HTTP and HTTPS CDN URLs - having both available is useful to avoid browser warning messages and lets web developers avoid hard-coding the URL scheme.

[caption id="attachment_115" align="alignleft" width="888"] C#: check blob storage and CDN URIs for uploaded file. [/caption]

[caption id="attachment_125" align="alignleft" width="923"] C#: IsBrowsable() method to check whether a URL is available[/caption]

The preceding code fragment (see Resources at the bottom of this post for a link to the complete code) shows a file being uploaded to a blob storage account that is used as a CDN origin, with nothing else that needs to be done to make the file available through the CDN. So this is a very straightforward way to generate files and immediately expose them through the Azure CDN.

Stream-based content

Sometimes we may need to serve content that isn't easily persistable as a file in a storage repository; for example, we may need to return a content stream. In the following code fragment, I have a REST API method that instantiates a content stream and writes it to the HTTP response's content property. The method also sets the TTL on the HTTP response using the CacheControl header property; here, I'm setting the TTL to 10 minutes (600 seconds).

[caption id="attachment_135" align="alignleft" width="996"] C#: API method HTTP response with stream content[/caption]

A REST API that returns content streams lets us use an Azure CDN endpoint that points at the API, rather than at a static storage location. Using an API lets us accommodate a very wide variety of dynamic content-generation scenarios, including varying by parameters or query string, geography, time of day, and more.

Using an API URL like https://mycdnapiendpoint.azureedge.net/api/content/i3 (see Resources at the bottom of this post for a link to the full code for this API app), we can again see the pattern of first load being slower than subsequent load as the CDN edge node first has a cache miss, retrieves the content from the origin, and then on subsequent requests has a cache hit.

[caption id="attachment_145" align="alignleft" width="664"] CDN API URL first and later loads[/caption]

Conclusion

The Azure CDN is an excellent way to offload your web application of serving static content. It offers lots of advanced features, a global footprint, and supports multiple endpoint types to accommodate static and dynamic content scenarios.

Resources

These are, respectively, links to apps that upload files to Azure blob storage and check immediate CDN availability (Storage Origin App), and a REST API app that returns a content stream based on an ID parameter. Please note the disclaimer here, and feel free to reach out to me with any questions.