Time to Stop Typing and Start Thinking

It's amazing how, sometimes, things get simpler the more you fiddle with them. Or, to be more precise, something that seems to be evolving into an increasingly complicated problem turns out to be easy to resolve when you step back and look at it from another direction. I guess it's what they call "lateral thinking"; the archetypal example being letting air out of the tyres of a truck that's just a bit too tall to pass under a low bridge.

It actually happened to me this month (the lateral thinking bit, not the deflating truck tyres bit) as I've been updating some of my own server-monitoring-kludge software. I'm still skirting the decision on installing Microsoft System Center to manage my servers. It's made more complicated by the fact that they are on different networks, public and private, and I really don't fancy tackling the complexity of such a product just to get monitoring information.

Over the years I've been using a selection of home-built Windows UI-based utilities to do things such as collect Event Log warnings and errors, monitor websites for connectivity, check for changes to firewall rules initiated by software updates (or by other more nasty causes), and monitor IIS logs for attacks or unusual activity. The trouble is that none of these work unless you are logged in, and collecting the information from each one is more complicated than it should be.

So I finally decided to put together a Windows Service that can run at startup and collate all the required information in one place. It shouldn't be hard; I already have all the code in the other utilities, so it's just a matter of combining it into one lump of executable stuff. Except, that's where it started to get complicated because everything depends on a timer that correlates the activities.

In the separate utilities the "timer tick" routines are optimized for the specific activity, and trying to combine them all into one resulted in a huge and unmanageable routine that attempted to reset the timer interval in multiple places. It needs to adjust the interval for requirements such as testing for recovery of failed websites at varying intervals, and concurrently manage different intervals for all of the other functions. At first I wondered whether to just include multiple System.Threading.Timer instances, one for each function. But to minimize server load and avoid threading problems when writing to log files I'd need to synchronize them so that only one function would be running at any time.

I tried several different ways of updating the timer repeat interval in each monitoring function routine, and then debugging problems with the various bits of code that changed it depending on current status. Then I tried having it fire every minute and getting the code to check for functions that should execute at that specific time. But it all seemed to be hugely over-complicated. There was always an edge case I'd missed, or some sequence of events that broke the cycle. Lack of proper design and planning up front (mainly because I was trying to reuse existing code) meant that the source file just kept growing, and each new bit raised another problem. Time to sit back, stop typing, and start thinking.

And that's when the solution became obvious. I don't actually need the timer to fire at specific intervals; I only need it to fire once after the required delay, at which point I can reset it. The interval depends on the status of each monitoring function (for example, if a website has failed and is waiting for recovery, or when the next specific monitoring check is due). When the timer fires I can simply disable it, execute any pending functions, figure out how long it is until the next monitoring action is due, and start the timer again with the appropriate delay.

In other words, in each "tick" event I can test whether it's time to execute each of the monitoring functions, carry out the ones that are due, and then - after all that's done - calculate the new required delay and set the timer to fire at that time. Each monitoring function knows how many minutes should elapse before it executes again. So a simple routine called at the end of the "tick" event handler code just iterates over all of the functions to return the lowest "number of minutes until due" value, sets the timer to this interval, and starts it running. And the interesting point is that, if I'd been designing the program from scratch, this is probably how I'd have decided to do it in the first place!

If you are feeling brave you can try out the server monitor service yourself (it's free). Get it from here.

Thankfully the IT world is reasonably well protected from the results of my vague program design approach because my day job is writing about code it rather than creating it. However, it struck me how similar this "evolution to simplicity" is to my own world of writing guidance and documentation. When I worked on the Enterprise Library 5 project some time ago we had a big guidance management problem trying to reuse nearly 1000 pages of documentation that had accumulated from the previous four versions. Attempting to massage it into shape and add new information turned out to be a nightmare job, in particular for the features that were new or had changed significantly.

In the end, it was only by scrapping large chunks and writing more targeted guidance for these features that we managed to scramble out of the mire. We did reuse small blocks of the original documentation, though (particularly for the Unity DI mechanism) we ended up writing mostly new content. It would be nice to say that we'd planned this approach at the start of the project, but sadly that's not true. We knew we'd need to rework the content, but trying to do that without a proper up-front plan just made the whole thing over-complicated and less useful.

Stopping typing and starting thinking allowed us to define what we felt was the ideal documentation structure, into which we could drop the appropriate blocks of existing content and then build around them following the plan. In development terms, we'd refactored the code and reused the existing functions, but written a new control loop that fired the specific functions at the appropriate times in the execution cycle.

I wonder if I can persuade the Office team to add the Visual Studio refactoring functionality into Microsoft Word...