An Async Html cache – part II – Testing the cache

Other posts:

Let’s try out our little cache. First I want to write a synchronous version of it as a baseline.

    Private Shared Sub TestSync(ByVal sites() As String, ByVal sitesToDownload As Integer, ByVal howLong As Integer)
Dim syncCache As New Dictionary(Of String, String)
Dim count = sites.Count()
Dim url1 = “”

For i = 0 To sitesToDownload – 1
Dim html As String = “”
Dim url = url1 & sites(i Mod count)
If Not syncCache.TryGetValue(url, html) Then
html = LoadWebPage(url)
syncCache(url) = html
End If
DoWork(html, howLong)
End Sub

This is a loop that loads webpages in the cache if they are not already there. sites is a list of tickers used to compose the urls; sitesToDownload is the total number of sites to download, so that a single url can be loaded multiple times; howLong represents the work to be done on each loaded page.

In this version the cache is simply a Dictionary and there is no parallelism. The two bold lines is where the cache is managed.

DoWork is this.

    Public Shared Sub DoWork(ByVal html As String, ByVal howLong As Integer)
End Sub

Let’s take a look at the asynchronous version.

    Private Shared Sub TestAsync(ByVal sites() As String, ByVal sitesToDownload As Integer, ByVal howLong As Integer)
Dim htmlCache As New HtmlCache
Dim count = sites.Count()
Dim url = “”
Using ce = New CountdownEvent(sitesToDownload)
For i = 1 To sitesToDownload
url & sites(i
Mod count),
DoWork(s, howLong)
End Sub)
End Using

There are several points worth making on this:

  • The lambda used as second parameter for GetHtmlAsync is invoked on a different thread whenever the html has been retrieved (which could be immediately if the cache has downloaded the url before)

  • CountDownEvent allows a thread to wait for a certain number of signals to be sent. The waiting happens on the main thread in the ce.Wait() instruction. The triggering of the event happens in the lambda described in the point above (the ce.Signal() instruction)

This is the driver for the overall testing.

    Private Shared Sub TestPerf(ByVal s As String, ByVal a As Action, ByVal iterations As Integer)
Dim clock As New Stopwatch

For i = 1 To iterations
Dim ts = clock.Elapsed
Dim elapsedTime = String.Format(s & “: {0:00}:{1:00}:{2:00}.{3:00}”, ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10)
Console.WriteLine(elapsedTime, “RunTime”)
End Sub

There is not much to say about it. Start the clock, perform a bunch of iterations of the passed lambda, stop the clock, print out performance.

And finally the main method. Note that all the adjustable parameters are factored out before the calls to TestPerf.

    Public Shared Sub Main()
Dim tickers = New String() {“mmm”, “aos”, “shlm”, “cas”, “abt”, “anf”, “abm”, “akr”, “acet”, “afl”, “agl”, “adc”, “apd”,
“ayr”, “alsk”, “ain”, “axb”, “are”, “ale”, “ab”, “all”}

Dim sitesToDownload = 50
Dim workToDoOnEachUrlInMilliSec = 20
Dim perfIterations = 5

TestPerf(“Async”, Sub() TestAsync(tickers, sitesToDownload, workToDoOnEachUrlInMilliSec), perfIterations)
“Sync”, Sub() TestSync(tickers, sitesToDownload, workToDoOnEachUrlInMilliSec), perfIterations)
End Sub

Feel free to change (tickers, sitesToDownload, workToDoOnEachUrlInMilliSec, perfIterations). Depending on the ratios between these parameters and the number of cores on your machine, you’re going to see different results. Which highlights the fact that parallelizing your algorithms can yield performance gains or not depending on both software and hardware considerations. I get ~3X improvement on my box. I attached the full source file for your amusement.