I caught some Flak this weekend at the Charlotte Code Camp when Justin realized my recent Scale Down with Windows Azure post was principally a screencast (aside from the code sample). So Justin, I’m documenting the screencast just for you! 🙂
First, a good place to start with this concept is on Neil Kidd’s blog post. Go ahead and read that now … I’ll wait. Most of this code is based off of his original sample, I’ve modified a few things and brought it forward to work with the latest SDK.
So, in a nutshell, a typical worker role template contains a Run() method in which we’d implement the logic to run our worker role. In many cases, there are multiple tasks and multiple workers. Unless the majority of the work you are doing is CPU bound (which is entirely possible, as is the case with our Azure-based distributed Folding project), the resources of the VM can be better utilized by multithreading the tasks and workers.
The trick is to do this correctly as writing multithreaded code is challenging. In general, parallel extensions is likely not the right approach in this situation. There are some exceptions – for example, if you are using a 4-core (large) VM and require lots of parallel processing, PFx might be the best approach. But that’s not often the case in the cloud. Instead, we need a lightweight framework that allows us to create a number of “processors” (using quotes here to avoid confusion with a CPU) that are responsible for doing their work independent of any other “processors” in the current instance. Each “processor” can run on its own thread, but the worker role itself, instead of doing the work, simply monitors the health of all of the threads and restarts them as necessary.
The implementation is not terribly complex – but if you aren’t comfortable with threading or just don’t want to reinvent the wheel, check out the base project. Feel free to add to or modify the project as necessary. Let’s step through some of the concepts.
Download the sample project (Visual Studio 2010) here.
First, it doesn’t matter if you implement this in a webrole or a workerrole. A webrole exposes the same Run() method that a workerrole does, and it doesn’t interfere with the operation of hosting a website – aside from the fact that there are limited resources per VM of course.
First up is the IProcessMessages interface. This interface is simple, basically saying our processors need to define how long they need per work unit, and expose a Process() method to call. Our health monitor keeps tabs on the processor, so it needs to know how long to wait before assuming the processor is hung:
A simple processor is then very easy to create. We just implement the IProcessMessages interface, and code whatever logic we need our worker to do inside the Process() method. We’re specifying that this processor needs only 20 seconds per work unit, so the health monitor will restart the worker in the event it doesn’t see progress when 20 seconds elapse. SyncRoot isn’t needed unless you need to do some locking:
So far, pretty simple. Our processor doesn’t need to be aware of threading, or handling/restarting itself. The ProcessRecord class does this for us. It won’t do the actual monitoring, but rather, implements the nuts and bolts of starting the thread for the current processor:
When the ProcessorRecord class is told to start the thread, it calls a single Run() method passing in the processor. This method will essentially run forever, calling Process() each iteration. Since we’re not getting notified of work, each processor is essentially polling for work. Because of this, a traditional implementation is to say if there is work to do, keep calling Process() as frequently as possible, but if there’s no work to do, sleep for some amount of time:
The current implementation is simple – it doesn’t do exponential back off if there’s no work to do, it just sleeps for the amount of time specified in the ProcessorRecord. That leaves us with one more task, and that’s defining our processors in the web/worker role Run() method. The nice thing about this approach is that it’s quite easy to add multiple instances to scale up or down as needed:
In the case above, we’re creating 2 processors of the same type, giving them different names (helpful for the log files), the same thread priority, and a sleep time of 5 seconds per iteration if there’s no work to do. In the Run() method, instead of doing any work, we’ll just monitor the health of all the processors. Remember, the Run() method shouldn’t exit under normal conditions because it will cause the role to recycle:
It may look complicated, but it’s pretty simple. Each iteration, we’ll look at each Processor. The Timeout is calculated based on the last known “thread test” (when the thread was last known to be alive and well, plus any process time or sleep time adjustments. If that time is exceeded, a warning is written to the log file and the Processor is reset. Worldmaps has been using this approach for about 6 months now, and it’s been flawless.
Is this the most robust and complete framework for multithreading worker roles? No. It’s a prototype – a good starting place for a more robust solution. But, the pattern you see here is the right starting point: the role instance itself knows what processors it wants, but doesn’t concern itself with their implementation or threading details. Each ProcessorRecord will execute its processor, and implements the threading logic, without regard to the other processors or the host. The Processors don’t care about threading, other processors, or the host, it just does its work. This separation of concerns makes it easy to expand or modify this concept as the application changes.
If you’re trying to get more performance out of your workers, try this approach and let me know if you have any comments.