Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
(For the next few posts, I’m going to introduce readers to the different feature teams in the Managed Languages org. Today, I’m starting this with a focus on the performance team.)
Back in 2000, I found myself assigned to be the performance lead of the Visual Basic team, and my first assigned goal was to drive the performance marks of the (then) forthcoming Visual Basic .NET to be in line with the numbers for Visual Basic 6.0. The primary focus was on the VB runtime APIs at first. That was a relatively simple task; APIs are nicely discrete bits of code, easy to measure and easy to evaluate and so, within a couple of months, I’d worked with the team to get the APIs at parity (or better). “Ah,” said I, “this performance work is so simple that I wonder why everyone is challenged by it. I must have a special gift for performance or something!”
This peculiar form of self-delusion lasted about a week, until the next challenge arose, which involved improving the shutdown speed of Visual Basic .NET itself, which was taking over a minute for large solutions. That functionality was not in code that was nicely constrained like APIs, and so it took me a long time (probably longer than it should have, in retrospect) to realize that the process was blocking on background compilation even when shutting down, instead of just abandoning the compilation altogether. And, having fixed that, I then moved on to tuning F5 build times, which involved several threads all needing to get their tasks done quickly and yet in a semi-serial fashion. None of them were operating incorrectly with respect to themselves; it was the combination of them that were causing slowdowns. That took days and days of investigation (and a lot of collaboration among several teams) to lock down. In that investigation, I encountered the blunt truths about performance: there is no perfect solution to a general performance problem, and also that you are never truly done tuning your product, because other dependent code can and will change around you.
Which brings us to 2014…
Now, in the intervening 14 years, the tools to evaluate performance have of course become more powerful, and we can find and address issues far faster than in days of yore, when code inspection & stopwatch timing was roughly 75% of the job. At the same time, however, the applications themselves have become so complex (either internally or with respect to the environment in which they run) that solving problems after the fact still creates a big challenge. In fact, it’s become even more imperative to design for performance up front, because there are more ways than ever to get whammied. During my recent stint in XBOX, for example, my team over there worked hard to generate performant code for the back end of SmartGlass, only to discover near the end that we hadn’t accounted for the inherent latency of using SSL between us and the data store – it was not a code issue per se, but a limitation of the environment that we hadn’t properly designed for. (Fortunately, our design was modular enough that we were able to put in some caching at relatively low cost to the project and still meet our performance goals on schedule!)
As we all know and as I allude to above, you’ll save a lot of time and effort if you design for performance in the first place. That’s always been a Microsoft tenet (indeed, our interview questions often touch upon generating performant code), and we take it very seriously on the Managed Languages team. But, since some performance issues will slip through just due to human nature, and since designs which seemed good at first may prove to be problematic afterwards, ongoing vigilance is paramount – constant monitoring is the key to success.
Performance and Roslyn
With Roslyn, therefore, we treat performance exactly as if it was a feature area which plans for specific work and which has progress presented to the team at each end-of-sprint showcase. It was designed for performance up-front, and during development we’ve constantly re-assessed & re-tuned the architecture to make it adhere to the goals that we’ve set for it. We have a performance lead (Paul) who runs a performance “v-team” (virtual team) drawn from the ranks of Managed Languages engineers as needed, and who works with a “performance champ” (Murad), telemetry champ (Kevin), and perf PM (Alex) to oversee the state of our performance on a daily basis.
This performance v-team has goals that it needs to meet and/or maintain, and these goals are drawn from the metrics of the most recently shipped product. This v-team is directly accountable to me, Manish, and Devindra (the latter two are our test manager and group program manager, respectively), and the three of us meet with the v-team every week to assess the previous week’s performance efforts and to create goals for the upcoming week. (We then are furthermore accountable to our upper management for meeting goals – and believe me, they are very serious about it!) The v-team also work with other teams in Visual Studio to find “wins” that improve both sides, and have been very successful at this.
As with any other product, performance is assessed with respect to two main categories: speed of operation and usage of memory. Trading off between the two is sometimes a tough challenge (I have to admit that more than once we’ve all thought “Hmm, can’t we just ship some RAM with our product?” :-)), and so we have track a number of key scenarios to help us fine-tune our deliverables. These include (but are not limited to):
- Build timing of small, medium, and (very) large solutions
- Typing speed when working in the above solutions, including “goldilocks” tests where we slow the typing entry to the speed of a human being
- IDE feature speed (navigation, rename, formatting, pasting, find all references, etc…)
- Peak memory usage for the above solutions
- All of the above for multiple configurations of CPU cores and available memory
These are all assessed & reported daily, so that we can identify & repair any check-in that introduced a regression as soon as possible, before it becomes entrenched. Additionally, we don’t just check for the average time elapsed on a given metric; we also assess the 98th & 99.9th percentiles, because we want good performance all of the time, not just some of the time.
We also use real-world telemetry to check our performance, both from internal Microsoft users as well as from customers. While automated metrics are all well and good, and very necessary for getting a day-to-day check on the performance of the project, “real world use” data is very useful for understanding how the product is actually running for folks. When the metric values conflict (for example, on typing), this leads us to improve the automated tests, which in turn makes it easier for us to reliably reproduce any problems and fix them. So, whenever you check that box that allows Visual Studio to send data to us, you are directly helping us to improve the product!
So, hopefully this gives you a bit of an insight into the performance aspects of our work. In the next post in a couple of weeks, I’ll talk a bit about how language planning works.
‘Til next time,
--Matt--*
Anonymous
January 15, 2014
Any chance you can expand on the work you did in XBOX on SmartGlass backend ?! Was it using Rosyln or were you doing pure perf tuning/debugging for them due to timeline constraints ? :)Anonymous
January 15, 2014
Nope, it had nothing to do with Roslyn at all. I was actually in the XBOX org at the time, and I was the dev lead of the team in charge of writing the backend to SmartGlass. Although I've been on Visual Studio or its antecedents for most of my 19-year-career at Microsoft, I took some time off starting in 2011 to go work in XBOX for 20 months -- I wanted to learn more about cloud programming and also refresh my coding skills which, as a dev manager who was spending most of his time in meetings, were in danger of getting out of date. It was a really excellent experience, and I took a lot of what I learned back to this job. (I even have some small amount of code in Roslyn now -- I have more pride in that than it possibly warrants given the size of what I personally did, but coding features is certainly fun whenever I get the chance.) Since you're curious: my team over there -- really smart folks! -- created the system by which existing video manifests from 3rd parties (Netflix, Hulu, etc.) could be augmented with synched scene metadata, as well as other related video metadata, which in turn come from another internal Microsoft team (who get to watch an awful lot of movies... :-)). The work is performed by leveraging Azure storage (both containers and tables) for the interim stages of processing before it ultimately gets cached away on a CDN for your viewing pleasure whenever you select and view the movie though the XBOX Live front end. (We also wrote the code for acquiring the video manifests themselves from the 3rd parties, and that was a fascinating experience for me as well -- I learned tons about data security in cloud scenarios.) --Matt--*Anonymous
January 15, 2014
Performance is cool, but can you make a post about actual Roslyn features? I have tried the CTP and it felt like a build-yourself-a-ReSharper toolkit - in a good sense. However I have seen some mentions that it will simplify things like meta-programming and DSLs. Can you clarify that? Thank you!Anonymous
January 15, 2014
That all sounds nicebut I was hoping for some more substantial infos: concrete perf comparsions with the current native compiler, both first compile vs incremental compilation. Parallel linking, etc. perf Improvements in the "infoless time" pre 2013, as hinted at by Dustin Campbell. Internal switch to Immutable Collections, what's going on there with ImmutableArray, what the target goal is with mem usage, what the actual perf "hitchs" are now and what's the goal for the shipping Roslyn. Is there a "good enough" barrier for GC pressure. How is the story going with regards to collaboration with the tooling vendors? We will temporarily pay a double-price as long as R# et al haven't switched, even though in the long run it will all be a much more efficient (shared) tooling infrastructure.Anonymous
January 16, 2014
The comment has been removedAnonymous
January 16, 2014
So what are the actual performace characteristics of Roslyn?Anonymous
January 16, 2014
All of the comments seem to be asking more or less the same thing, so I'll tackle them all at once (admittedly, in the annoyingly vague way that I must do for anything that is still under development). First off, I can't/won't go into precise timings for benchmarks, because frankly they would be meaningless, given that whatever we show off in any future previews (which I alluded to in my first post late last year) will have different numbers. (Similarly, I won't discuss feature characteristics for the same reasons.) Our numbers are good; we want them to be better. We will always want them to be better, even if they are already "good," and we'll continually work towards making them better right up until the end when they pry the bits out of our hands. As a result, performance work is going on all the time, numbers change daily, and I therefore have no wish to enshrine today's numbers (as opposed to yesterday's, or tomorrow's) as "Internet Truth." In my first Roslyn post, I mentioned that VS QA had signed off on the performance before we did the Big Switch, and therefore you can safely conclude that our metrics were at least as good as what the division was experiencing in their day-to-day coding using the native compilers. And looking at today's perf scorecard, we seem to be pretty much where we were then (good!). Beyond that, "deponent saith not." The biggest "hitch" that we (or other software makers) have is how to add value (i.e. new features) without adding time, and it's an issue that plays a heavy role in the decisions that we make around perf. For example (and purely hypothetically): let's consider a set of perf metrics at a point-in-time, and call it "A." Now, assume that we conceive of a groovy new feature that leverages some of the AST information that Roslyn caches, and that we furthermore believe that most users will want it on by default. By itself, the new feature of course involves some additional coding, and the perf characteristics of it are going to be non-zero (perhaps very small, but still non-zero) -- we'll call that "n". If implemented, what are the perf characteristics now? Are they A + n? That seems logical, but perhaps we can find some economies of scale when the two pieces come together, so maybe it's A + n - Intersection(A, n). Or, if we're smart, it will be even less than A -- because we made improvements elsewhere (in "B," perhaps, or in "A" itself) at the same time. The latter approach is the best of all worlds and also just the right thing to do -- "never assume old code is sacred" is our motto. But this sort of thing happens all the time, for each new feature under consideration, which also means that perf numbers are constantly in motion (ideally in a good way) for both old and new code. (All hypothetically, of course.) So, apologies for not being more specific (or, in fact, for being maddeningly vague). I'll leave you with the reminder that we are working on a preview plan so you can experience it all yourselves firsthand... --Matt--*Anonymous
January 17, 2014
Forget performance, tell us more about features! What are the language extension capabilities?Anonymous
January 17, 2014
The comment has been removedAnonymous
January 18, 2014
The comment has been removedAnonymous
January 20, 2014
The comment has been removedAnonymous
January 21, 2014
Interesting post thanks. How do you classify small, medium, and large solutions? Is it the number of projects, and if so what are the boundaries?Anonymous
January 31, 2014
In other words, it runs like a slugAnonymous
January 31, 2014
In other words, it runs like a slug.Anonymous
February 09, 2014
A test case to add if you don’t have it. Open a very large solution, and then open lots of files in it. Then minimise VS and run something else that uses lots of RAM so VS gets paged out. Then exit VS and see how many of its pages have to be reloaded as part of the existing.Anonymous
February 21, 2014
"In the next post in a couple of weeks..." What happened to that post? It's been over a month already. Please don't let this blog die again.Anonymous
March 08, 2014
Hello. Will Roslyn or C# apps compiled in future get benefit from HLE (Hardware Lock Elision)? I read that as is, the TSX/HLE actually performs more poorly than software locks in many cases, so I suppose to take advantage of those, there would be need to profile. Perhaps this could work by running a profiled C# build and then that profiling run creates rules which can be applied on later compiles to selectively apply the HLE. Now it might be possible using JIT at runtime to "promote" the lock to HLE if available and then fallback if the performance worsens, but I wonder if this can be implemented without massive overhead. It's probably one of the "try and see" ones.Anonymous
March 10, 2014
The comment has been removedAnonymous
March 28, 2014
microsoft power roslyn those interested in seeing the power of roslyn send mail to wilmer1104@yahoo.com send them to the great exampleAnonymous
April 03, 2014
In case anyone is coming back here for an update and hasn't heard the news, Roslyn is now open source: https://roslyn.codeplex.comAnonymous
July 28, 2015
Good changes to provide more performances.