Rock, Paper, Azure Deep Dive: Part 2

In part 1, I detailed some of the specifics in getting the Rock, Paper, Azure (RPA) up and running in Windows Azure.   In this post, I’ll start detailing some of the other considerations in the project – in many ways, this was a very real migration scenario of a reasonably complex application. (This post doesn’t contain any helpful info in playing the game, but those interested in scalability or migration, read on!)

The first issue we had with the application was scalability.  Every time players are added to the game, the scalability requirements of course increases.  The original purpose of the engine wasn’t to be some big open-ended game played on the internet;  I imagine the idea was to host small (10 or less players).    While the game worked fine for < 10 players, we started to hit some brick walls as we climbed to 15, and then some dead ends around 20 or so. 

This is not a failing of the original app design because it was doing what it was intended to do.  In my past presentations on scalability and performance, the golden rule I always discuss is:  you have to be able to benchmark and measure your performance.  Whether it is 10 concurrent users or a million, there should always be some baseline metric for the application (requests/sec., load, etc.).   In this case, we wanted to be able to quickly run (within a few minutes) a 100 player round, with capacity to handle 500 players. 

The problem with reaching these numbers is that as the number of players goes up, the number of games played goes up drastically (N * N-1 / 2).   Even for just 50 players, the curve looks like this:

image

Now imagine 100 or 500 players!  The first step in increasing the scale was to pinpoint the two main problem areas we identified in the app.  The primary was the threading model around making a move.  In an even match against another player, roughly 2,000 games will be played.   The original code would spin up a thread for each _move_for each game in the match.   That means that for a single match, a total of 4,000 threads are created, and in a 100-player round, 4,950 matches = 19,800,000 threads!  For 500 players, that number swells to 499,000,000.

The advantage of the model, though, is that should a player go off into the weeds, the system can abort the thread and spin up a new thread in the next game.

What we decided to do is create a single thread per player (instead of a thread per move).  By implementing 2 wait handles in the class (specifically a ManualResetEvent and AutoResetEvent) we can accomplish the same thing as the previous method.  (You can see this implementation in the Player.cs file in the DecisionClock class.) 

The obvious advantage here is that we go from 20 million threads in a 100 player match to around 9,900 – still a lot, but significantly faster.   In the first tests, 5 to 10 player matches would take around 5+ minutes to complete.   Factored out (we didn’t want to wait) a 100 player match would take well over a day.   In this model, it’s significantly faster – a 100 player match is typically complete within a few minutes.

The next issue was multithreading the game thread itself.  In the original implementation, games would be played in a loop that would match all players against each other, blocking on each iteration.  Our first thought was to use Parallel Extensions (of PFx) libraries built into .NET 4, and kicking off each game as a Task.  This did indeed work, but the problem was that games are so CPU intensive, creating more than 1 thread per processor is a bad idea.  If the system decided to context switch when it was your move, it could create a problem with the timing and we had an issue with a few timeouts from time to time.   Since modifying the underlying thread pool thread count is generally a bad idea, we decided to implement a smart thread pool like the one here on The Code Project.   With this, we have the ability to auto scale the threads dynamically based on a number of conditions.

The final issue was memory management.  This was solved by design:  the issue was that original engine (and Bot Lab) don’t store any results until the round is over.  This means that all the log files really start to eat up RAM…again, not a problem for 10 or 20 players – we’re talking 100-200+ players and the RAM just bogs everything down.  The number of players in the Bot Lab is small enough where this wasn’t a concern, and the game server handles this by design by using SQL Azure, recording results as the games are played.

Next time in the deep dive series, we’ll look at a few other segments of the game.  Until next time!