F# on Windows Azure

Windows Azure was announced yesterday, and along with it, the first CTP of the SDK and Visual Studio tools.  If you haven’t yet tried it, go take a look.  On top of serving as a hosting service for web applications, Azure also provides a really simple way to do distributed compute and storage in the cloud.

Azure supports running .NET applications, which means you can build Azure worker roles using F#! The tools released with Azure don’t have F# support out of the box though, so I’ve posted a few simple templates and samples up on Code Gallery.

Download

F# Templates and Samples for Windows Azure

Templates

image

Cloud WebCrawl Sample

 namespace SearchEngine_WorkerRole

open System
open System.Threading
open Microsoft.ServiceHosting.ServiceRuntime
open System.Net
open System.IO
open System.Text.RegularExpressions
open Microsoft.Samples.ServiceHosting.StorageClient;
open System.Web
open System.Runtime.Serialization.Formatters.Binary

type WorkerRole() =
    inherit RoleEntryPoint()

    // The page to start crawling from
    let startpage = @"https://blogs.msdn.com/lukeh"
    // The filter to apply to links while crawling
    let pageFilter = fun (url:string) -> url.StartsWith("https://blogs.msdn.com/")

    /// Get the contents of a given url
    let http(url: string) = 
        let req    = WebRequest.Create(url) 
        use resp   = req.GetResponse()
        use stream = resp.GetResponseStream() 
        use reader = new StreamReader(stream) 
        let html   = reader.ReadToEnd()
        html

    /// Get the links from a page of HTML
    let linkPat = "href=\s*\"[^\"h]*(https://[^&\"]*)\""
    let getLinks text =  [ for m in Regex.Matches(text,linkPat)  -> m.Groups.Item(1).Value ]
    
    /// Handle the message msg using the given queue and blob container
    let HandleMessage (msg : Message) (queue : MessageQueue, container: BlobContainer) =
        // There was a new item, get the contents
        let url = msg.ContentAsString();
        let urlBlobName = HttpUtility.UrlEncode(url)
        // Don't get the page if we've already seen it
        if not(container.DoesBlobExist(urlBlobName)) 
        then
            do RoleManager.WriteToLog("Information", String.Format("Handling new url: '{0}'", url));
            try
                // Get the contents of the page
                let content = http url
                // Store the page into the blob store
                let props = new BlobProperties(urlBlobName)
                let _ = container.CreateBlob(props, new BlobContents(System.Text.UTF8Encoding.Default.GetBytes(content)), true);
                
                // Get the links from the page
                let links = getLinks content
                
                // Filter down the links and then create a new work item for each
                links
                |> Seq.filter pageFilter
                |> Seq.distinct
                |> Seq.filter (fun link -> not(container.DoesBlobExist(HttpUtility.UrlEncode(link))))
                |> Seq.iter (fun link -> queue.PutMessage(new Message(link)) |> ignore)
                queue.DeleteMessage(msg) |> ignore
            with
            | _ ->()
    
    /// Main loop of worker process
    let rec Loop (queue : MessageQueue, container: BlobContainer) = 
        // Get the next page to crawl from the queue
        let msg = queue.GetMessage(240);
        if msg = null
        then Thread.Sleep(1000)
        else HandleMessage msg (queue, container)
        Loop(queue,container)
    
    override wp.Start() =
        // Initialize the Blob storage
        let blobStorage = BlobStorage.Create(StorageAccountInfo.GetDefaultBlobStorageAccountFromConfiguration());
        let container = blobStorage.GetBlobContainer("searchengine");
        let a = container.CreateContainer(null, ContainerAccessControl.Public);

        // Initialize the Queue storage
        let queueStorage = QueueStorage.Create(StorageAccountInfo.GetDefaultQueueStorageAccountFromConfiguration());
        let queue = queueStorage.GetQueue("searchworker");
        let b = queue.CreateQueue()
        
        // Put an initial message in the queue, using the start page
        let c = queue.PutMessage(new Message(startpage));
        
        // Begin the main loop, processing messages in the queue
        Loop(queue, container)
        
    override wp.GetHealthStatus() = RoleStatus.Healthy 

Worker Roles

The code above defines the implementation of a Worker Role – a process which runs in the background, waiting for work to do, and then processing these work requests.  The worker role is set to run 4 instance simultaneously, which means that there will be 4 instances of this worker processing work items as they come in.  This gives an implicit parallelism – in fact, the initial release of Azure will run one process per core, so you really are getting effective parallelism this way.  Notice also that this requires that the worker processes are inherently stateless.  Both aspects make typical functional design approaches that are common in F# natural for developing these worker roles.

Queues and Blobs

This sample uses two of the three data formats supported by Windows Azure.  The queue storage holds the work items. The blob storage holds the pages visited during the web crawl.  When an instance of the worker role Starts, it connects to the blob store and the queue store, then puts an initial work item in the queue and goes into a loop processing work items out of the queue.

Conclusion

Ideas for any other interesting F# applications on Windows Azure?  Download the templates and samples.