Roslyn performance (Matt Gertz)

(For the next few posts, I’m going to introduce readers to the different feature teams in the Managed Languages org. Today, I’m starting this with a focus on the performance team.)

Back in 2000, I found myself assigned to be the performance lead of the Visual Basic team, and my first assigned goal was to drive the performance marks of the (then) forthcoming Visual Basic .NET to be in line with the numbers for Visual Basic 6.0.  The primary focus was on the VB runtime APIs at first.  That was a relatively simple task; APIs are nicely discrete bits of code, easy to measure and easy to evaluate and so, within a couple of months, I’d worked with the team to get the APIs at parity (or better).   “Ah,” said I, “this performance work is so simple that I wonder why everyone is challenged by it.  I must have a special gift for performance or something!”

This peculiar form of self-delusion lasted about a week, until the next challenge arose, which involved improving the shutdown speed of Visual Basic .NET itself, which was taking over a minute for large solutions.  That functionality was not in code that was nicely constrained like APIs, and so it took me a long time (probably longer than it should have, in retrospect) to realize that the process was blocking on background compilation even when shutting down, instead of just abandoning the compilation altogether.  And, having fixed that, I then moved on to tuning F5 build times, which involved several threads all needing to get their tasks done quickly and yet in a semi-serial fashion.  None of them were operating incorrectly with respect to themselves; it was the combination of them that were causing slowdowns.  That took days and days of investigation (and a lot of collaboration among several teams) to lock down.  In that investigation, I encountered the blunt truths about performance:  there is no perfect solution to a general performance problem, and also that you are never truly done tuning your product, because other dependent code can and will change around you. 

Which brings us to 2014…

Now, in the intervening 14 years, the tools to evaluate performance have of course become more powerful, and we can find and address issues far faster than in days of yore, when code inspection & stopwatch timing was roughly 75% of the job.  At the same time, however, the applications themselves have become so complex (either internally or with respect to the environment in which they run) that solving problems after the fact still creates a big challenge.  In fact, it’s become even more imperative to design for performance up front, because there are more ways than ever to get whammied.  During my recent stint in XBOX, for example, my team over there worked hard to generate performant code for the back end of SmartGlass, only to discover near the end that we hadn’t accounted for the inherent latency of using SSL between us and the data store – it was not a code issue per se, but a limitation of the environment that we hadn’t properly designed for.  (Fortunately, our design was modular enough that we were able to put in some caching at relatively low cost to the project and still meet our performance goals on schedule!)

As we all know and as I allude to above, you’ll save a lot of time and effort if you design for performance in the first place.  That’s always been a Microsoft tenet (indeed, our interview questions often touch upon generating performant code), and we take it very seriously on the Managed Languages team.  But, since some performance issues will slip through just due to human nature, and since designs which seemed good at first may prove to be problematic afterwards, ongoing vigilance is paramount – constant monitoring is the key to success. 

Performance and Roslyn

With Roslyn, therefore, we treat performance exactly as if it was a feature area which plans for specific work and which has progress presented to the team at each end-of-sprint showcase.  It was designed for performance up-front, and during development we’ve constantly re-assessed & re-tuned the architecture to make it adhere to the goals that we’ve set for it.  We have a performance lead (Paul) who runs a performance “v-team” (virtual team) drawn from the ranks of Managed Languages engineers as needed, and who works with a “performance champ” (Murad), telemetry champ (Kevin), and perf PM (Alex) to oversee the state of our performance on a daily basis. 

This performance v-team has goals that it needs to meet and/or maintain, and these goals are drawn from the metrics of the most recently shipped product.  This v-team is directly accountable to me, Manish, and Devindra (the latter two are our test manager and group program manager, respectively), and the three of us meet with the v-team every week to assess the previous week’s performance efforts and to create goals for the upcoming week.  (We then are furthermore accountable to our upper management for meeting goals – and believe me, they are very serious about it!)  The v-team also work with other teams in Visual Studio to find “wins” that improve both sides, and have been very successful at this.

As with any other product, performance is assessed with respect to two main categories: speed of operation and usage of memory.  Trading off between the two is sometimes a tough challenge (I have to admit that more than once we’ve all thought “Hmm, can’t we just ship some RAM with our product?” :-)), and so we have track a number of key scenarios to help us fine-tune our deliverables.  These include (but are not limited to):

  • Build timing of small, medium, and (very) large solutions
  • Typing speed when working in the above solutions, including “goldilocks” tests where we slow the typing entry to the speed of a human being
  • IDE feature speed (navigation, rename, formatting, pasting, find all references, etc…)
  • Peak memory usage for the above solutions
  • All of the above for multiple configurations of CPU cores and available memory

These are all assessed & reported daily, so that we can identify & repair any check-in that introduced a regression as soon as possible, before it becomes entrenched.  Additionally, we don’t just check for the average time elapsed on a given metric; we also assess the 98th & 99.9th percentiles, because we want good performance all of the time, not just some of the time.

We also use real-world telemetry to check our performance, both from internal Microsoft users as well as from customers.  While automated metrics are all well and good, and very necessary for getting a day-to-day check on the performance of the project, “real world use” data is very useful for understanding how the product is actually running for folks.  When the metric values conflict (for example, on typing), this leads us to improve the automated tests, which in turn makes it easier for us to reliably reproduce any problems and fix them. So, whenever you check that box that allows Visual Studio to send data to us, you are directly helping us to improve the product!

So, hopefully this gives you a bit of an insight into the performance aspects of our work.  In the next post in a couple of weeks, I’ll talk a bit about how language planning works.

‘Til next time,

  --Matt--*