This post announces an updated preview of the .NET team’s new 64-bit Just-In-Time (JIT) compiler. It was written by Mani Ramaswamy, Program Manager for the .NET Dynamic Code Execution Team.
Note: RyuJIT CTP3 is available here: http://blogs.msdn.com/b/dotnet/archive/2014/04/03/the-next-generation-of-net.aspx.
The developer preview of RyuJIT, CTP1, received a thunderous response (so much so we had to post a FAQ soon after). Two questions commonly asked were when would there be an update and when would it support feature X or Y that is in the existing 64-bit .NET JIT compiler. CTP2 answers both questions. This release of RyuJIT has equivalent functionality of existing JIT64: there aren’t any feature differences between RyuJIT and the existing JIT64 at this point. RyuJIT generates code that’s on average better than the existing JIT64, while it continues to maintain the 2X throughput wins over JIT64.
Improvements: Features, Reliability, Performance
The two main features which weren’t supported in CTP1 were “opportunistic” tail calls and Edit & Continue. With CTP2, both of these features are supported. Additionally, a host of other features have been added to achieve functional parity with JIT64. Along the way, we’ve (the .NET Code Generation team) also added a number of performance tweaks and optimizations so that code generated using RyuJIT is generated fast (the throughput metric) and runs fast (the code quality metric).
But why stop there? We have thrown every test at our disposal at RyuJIT and it has come out with flying colors – whether it be running common server software using IKVM.NET (a Java Virtual Machine implemented in .NET), or complex ASP.NET workloads, or even simple Windows Store apps. Thanks to everyone who tried out the first CTP of RyuJIT and filed bug reports – we’ve fixed every single one of them, and at this point, RyuJIT doesn’t have any known bugs.
We continue to look at ways to improve the overall quality of RyuJIT, and will likely discover a few more bugs along the way. From the enthusiastic response we got from the first CTP, we’ll surely hear back with a few more bugs from our early adopters, i.e. you.
When it comes to performance, CTP1 demonstrated that RyuJIT handily beats JIT64 on throughput (how fast the compiler generates code) by a factor of 2X. We’ve been careful to maintain our throughput wins, and this CTP should yield similar throughput numbers. With CTP1, the focus was on throughput and to get some early feedback, and not so much on code quality (how fast the generated code executes).
While with CTP1, we were in the same ball park as JIT64, we were still 10-20% slower on code quality, with some outliers. With CTP2, we’ve addressed that – at this point, on average we should be at par or beating JIT64 on code quality. If during your evaluation, you find a benchmark where RyuJIT is trailing JIT64 performance significantly, please reach out to us – by the time we’re done, RyuJIT should be producing code that’s better than what JIT64 produced. This is not to say that there couldn’t be a few micro-benchmarks where JIT64 produces more optimal code, but rather to say that on average RyuJIT should be on par or better, and in the few (rare) cases it does trail JIT64 performance, it trails by only a few percentage points. We tried out many common code quality benchmark suites internally, and found that RyuJIT code quality on average is better than the existing .NET JIT64 compiler – thus if you do find an outlier, we’re most interested.
The chart below shows our performance, relative to JIT64’s across a number of benchmarks, some very small, others fairly large. Positive numbers indicate RyuJIT performing better than JIT64. Negative numbers indicate the opposite. The gray section is the limit of “statistical noise” for each benchmark, so any bar that is within the gray area indicates effectively identical performance. Check the CodeGen blog within a day or two for a detailed description of the methodology and specifics about the benchmarks we’re running. Overall, we’re doing quite well, with only a handful of losses, and some very nice wins!
While we needed to first get all the functionality and quality metrics lined up and achieve parity on performance (code quality) with JIT64 (we’re already 2X faster on throughput, in case you forgot), our re-architecture puts us in a great place for optimizing .NET dynamic code execution scenarios. Over the next few months, you will continu

If NGen is not impacted, what happens when RyuJIT is enabled for the entire machine and the user runs an NGen-ed program? Is the NGen-ed image ignored, or is RyuJIT bypassed?
A micro-benchmark question: how much is gained for the startup time of PowerShell ISE?
Do you have advice on optimizing code for RyuJIT (as in: code constructs that should be avoided)?
@Lionel: If NGen is not impacted…: RyuJIT is bypassed completely for NGen. The "AltJit" mechanism is only observed when the JIT is loaded in a normal context. When the JIT is loaded for NGen, it's ignored on the default JIT (JIT64 for a 64 bit .NET runtime) is used.
I haven't looked at the PowerShell ISE, but it would be easy enough to: All you'd need to do is set the COMPLUS_AltJit variable to * and launch it, and then launch it normally. You can check to see if RyuJIT is in use by looking at loaded modules to see if 'protojit.dll' is in the process.
As far as advice on optimizing for RyuJIT, currently, it's still in flux. We're not done with optimizations, so any advice would be premature. I'd much rather have that question flipped around: what code constructs would you like us to optimize for?
Is there any chance we see improved performance for rectangular arrays. I am quite annoyed that they are actually slower than jagged arrays although they should be significantly faster. I know that this is low priority and unimportant but somehow it bugs me although I never had to use rectangular arrays in high performance scenario.
"I'd much rather have that question flipped around: what code constructs would you like us to optimize for?"
Well… the list example I posted a while ago still generates horrible code 🙂
static int Sum(List<int> list) {
int sum = 0;
foreach (int x in list)
sum += x;
return sum;
}
I sure would like to see some SIMD options soon.
When I wrapped a volatile variable in a struct everything was perfectly inlined, and even though JIT generated more instructions (instead of just single MOV) perf was better. Probably because CPU had to do less work translating MOV to microcode or because of chunks in which it reads from instructions cache, don't know.
Do you take into account such effects on modern processors?
bool sameType = obj1.GetType() == obj2.GetType();
IntPtr typeHandleValue = obj..GetType().TypeHandle.Value;
That's it.. for now )
It would be nice to be able to tell the JIT to prioritize code quality over throughput. We don't always care that much about startup time, but we do care very much about performance once the application is up.
Is RyuJIT going to replace current JIT in the next framework version?
How can the environment variable COMPLUS_AltJit=* affect a single application? Replace * with the application name / path?
@Wolfgang: Well, only set the environment variable in the context you're running the application in, not for the whole system. So skip Control Panel, use cmd.
@Stilgar: We've made some headway here, but I'm not sure if it's quite there yet. I'll poke around a bit more and post back.
@Mike Danes: I know. And any excuse I might try and make will just sound whiny, so I'll just leave it at that 🙂
@LKeene: Continue to vote it up on UserVoice!
@OmariO:
1- unsafe code is just not that high a priority for us right now. That said, it should be terrible: we probably won't do much worse than JIT64. If you've got some code that we're not doing too well on, send it along!
2- To be honest, modeling what 'modern processors' do is a pretty difficult thing, because they all behave quite differently. At a high order, we tend to favor smaller code over "faster" code because in the real world, if your hot path fits in 1 fewer pages, most cycles saved by using larger code sequences gets eaten by the page fault.
3 & 4- I'll poke around in what we generate today and post back.
@André Slupik: We've gotten the RyuJIT compiler to a point where we actually have some optimizations that we can "turn up to 11". Knowing when & where to do so is the hard part (see previous comment about code size), but we're definitely looking in this direction.
@Pop Catalin: Without making any promises, we're doing our best to be able to do that. Changing a JIT compiler is kind of scary. Back when I was working on C++ we'd ship a new C++ compiler, people picked up the new compiler, then they recompiled their code, then they tested their code, then they shipped the updated application. In a JIT world, we ship a new JIT and everyone's existing application is magically compiled with the new JIT. It's like trying to change a flat while the car is driving. It can definitely be done, it's just hard and requires a lot of coordinated effort.
en.wikipedia.org/…/In-flight_refuelling 🙂
a)
What about a way to tag a method or class as "hot". This is often not statically known, so the dev can help out.
In a hot method, inlining and loop unrolling heuristics would be turned up greatly. We no longer need to be conservative here because the developer told the JIT not to be.
This seems like quite a small engineering investment to make. So the only question is whether you want engineers to make that decision.
Given that the JIT will likely never get this completely right, and your resources are limited, I think this feature should be implemented. Devs will thank you for a reliable way to force this.
b)
Did you consider dynamic range check elimination yet? wikis.oracle.com/…/RangeCheckElimination Loops are split into 3 loops where the middle loop has no range checks. 99% of the time is spent in the middle loop.
If this was implemented, statically removing range checks becomes less important.
As André Slupik said, a option for enabling rather expensive optimizations, maybe as Assembly and Method Level Attribute or based on a profiling step which marks methods that can profit from a further optimization, would be nice.
@xor88 & Suchiman: a) Prior to RyuJIT, we really didn't have any opts we could add and have any certainty that things would be better. We do have "MethodImpl.AggressiveInlining". We'll definitely keep this one in mind. b) We added Loop Cloning, where the dynamic bounds check occurs, and if it passes, then we go down a bounds-check-free path, otherwise we do the slow version. I'll have to dig out a small example and dump it on the codegen blog. It's there in CTP2. It winds up bloating code, so we're rather conservative about it, and it's not particularly well tuned yet, but it's definitely there, and shows some really promise.
Surprising that you require 2012 R2, when straight 2012 is just now widely available in the wild, where I really want to test it!
Would like to see some SIMD(!) Yes, voted it up already. Please everyone do so too(!)
I don't understand how to enable the compiler. Can someone post step-by-step instructions? That would be much appreciated!
@Kevin Frei, any plans on branch prediction (en.wikipedia.org/…/Branch_prediction) and loop-invariant-hoisting (en.wikipedia.org/…/Loop-invariant_code_motion) optimizations.
In case you guys didn't noticed the most viewed question on StackOverflow //stackoverflow.com/q/11227809
@LKeene: If you have an application that can be invoked from a command line window,
1) Open a commandline window (windows key->q, type cmd on the search text box, click on "Command Prompt")
2) Change directory to the path where you application is from your command prompt
3) Type the following on the command prompt "SET COMPLUS_AltJit=*" and enter
4) Run your application from the command line
This will ensure that your application will use RyuJIT.
If you want to go back to the default 64 bit JIT for other managed applications, close the application and close the Command Prompt or enter "SET COMPLUS_AltJit=" in the Command Prompt.
Let me know if you want step by step instructions on how to set the registry key which will cause all managed applications to run using RyuJIT (not recommended for a production machine). Or if you want to check via a debugger that the RyuJIT is getting loaded.
Hope this helps.
Lakshan Fernando
@Kevin Frei: "I know. And any excuse I might try and make will just sound whiny, so I'll just leave it at that :-)"
No problem, I hope I didn't sound too insistent about the issue. It's just that it seems a bit surprising that such a small (and not uncommon) piece of code can generate code with so many issues (I think I count 4 distinct ones).
@AzureSky: "Would like to see some SIMD(!) Yes, voted it up already. Please everyone do so too(!)"
I think I up-voted on the Connect bug about this but I'm seriously considering down-voting the bug. It seems to me that the righter high count of up-votes is the result of a "SIMD is cool" style thinking rather than the result of more down to earth thinking: "SIMD actually works given the optimization constraints of a JIT compiler". Otherwise you can end up in situations like one I've recently seen in the VC++ compiler – doing 2 additions with a single SIMD instruction and then (unnecessarily) spilling the registers to memory and killing the perf.
@Lakshan: Thank you!
And yes, if you could, please post step-by-step instructions on setting the registry so that the RyuJit option is set globaly (as well as the debug info to verify that it's running). Much appreciated!
v8-richards, isn't this JavaScript. What is the relationship with RyuJIT?
@mattias – It could have been compiled using IronJS, or translated to c#
@Jerry: We're intimately aware of the branch predictor hardware. Being aware of it doesn't really make it something that we can optimize for. Intel's branch predictor is incredibly complex (and incredibly capable). There are a few places where we try to take branch prediction into account, but doing it for general purpose code is just not particularly useful, because the hardware handles it quite well. Loop Invariant Code Motion is already in place in RyuJIT. We hoist loop invariant expressions out of loops. While we don't always detect what's loop invariant, the core algorithm is in place. To make it work better, we just have to make the core analysis better, which makes everyone happy, not just folks who want LICM to work :-).
@mattias, @Adam: I'm writing up a codegen blog post right that goes into more details about the benchmarks we're running. The v8-* benchmarks are all manual translations from JavaScript to C#. The originals all have pretty open licensing bologna, so I'll probably dump the source code somewhere public when I finish the write-up.
@LKeene: To enable RyuJIT on the entire machine (we don't recommend that you do this if you have any business impact managed applications in the machine):
1) Install RyuJIT CTP2 in the machine
2) Open an elevated command prompt Window (windows key->q, type cmd on the search text box, right click on "Command Prompt" and select "Run as administrator")
3) Turn on the registry Key to enable RyuJIT in the entire machine from the command prompt as follows (reg add HKLMSoftwareMicrosoft.NETFramework /reg:64 /v AltJit /t REG_SZ /d * /f)
4) After evaluating managed applications running under RyuJIT, you can turn it off from the registry setting as follows (reg delete HKLMSoftwareMicrosoft.NETFramework /reg:64 /v AltJit /f)
Kevin Frei has uploaded a script that does this for you at kscdg.codeplex.com/…/latest
Download the script and from your elevated command prompt run "protojit on" to turn on the registry setting, "protojit off" to turn it off, "protojit" to see if its currently turned off or not, "protojit proc" to see the processes that have protojit.dll currently loaded.
If you want to use a debugger, WinDBG, to break when RyuJIT gets loaded for a managed app that can be run from the commandline,
1) Install WinDBG from msdn.microsoft.com/…/ff551063(v=vs.85).aspx
2) Enable RyuJIT using one of the options above
3) Run the application from the command prompt using the debugger (ex. c:Debuggerswindbg.exe test.exe)
4) On the windbg command window, type "sxe ld:protojit" and enter
5) On the windbg command window, type "g" and enter and you should see a break when protojit.dll gets loaded to compile a method
Kevin's script will allow you to see this much easily with "protojit proc"
Hope this helps.
Lakshan Fernando
Final note: If you want details about the benchmarks, check out the codegen blog: blogs.msdn.com/…/lies-damn-lies-and-benchmarks.aspx
When trying to run our application with RyuJIT I get stack overflow crashes, but only in Release builds without the debugger attached (i.e. release mode JIT). It is a big mixed native/managed app. Any ideas on how to debug this?
This is what I get from the debugger:
Unhandled exception at 0x00007FF9420B5684 (clr.dll) in MyApp.exe: Stack cookie instrumentation code detected a stack-based buffer overrun.
MANAGED_STACK:
(TransitionMU)
000000A6AA04D420 00007FF8E4FE3396 mscorlib!System.Security.CodeAccessSecurityEngine.Assert(System.Security.CodeAccessPermission, System.Threading.StackCrawlMark ByRef)+0x46
000000A6AA04D460 00007FF8E4FE3333 mscorlib!System.Security.CodeAccessPermission.Assert()+0x23
000000A6AA04D4A0 00007FF8E5CE9CAF Infragistics4_Win_UltraWinDock_v13_1!Infragistics.Win.UltraWinDock.Utilities.GetParent(System.Windows.Forms.Control)+0x6f
Child-SP RetAddr Call Site
000000a6
aa04b910 00007ff9448835de clr!_report_gsfailure+0x1c000000a6
aa04b950 00007ff94455994c clr!StackFrameIterator::NextRaw+0xdb7000000a6
aa04ba90 00007ff94455967e clr!Thread::StackWalkFramesEx+0x174000000a6
aa04be60 00007ff94462cb73 clr!Thread::StackWalkFrames+0xbe000000a6
aa04cf70 00007ff8e4fe3396 clr!SecurityStackWalk::CheckNReturnSO+0x293000000a6
aa04d420 00007ff8e4fe3333 0x00007ff8`e4fe3396@Ståle L. Hansen:
It looks like there's some stack corruption going on, which makes it particularly nasty to debug. I'd give even odds that it's a problem with your code, or a problem with RyuJIT. We lay out the stack quite a bit differently than JIT64 did, so you may have had minor stack corruption occurring, but the contents of the stack that were getting overwritten were no longer used. The first step to tracking the issue down is to try to get the debugger attached. From inside Visual Studio, you can turn on 'retail' debugging by unchecking the "Suppress JIT optimization on module load (Managed only)" box in the Tools=>Options=>Debugging dialog. Once you're there, make sure you can reproduce the issue. Feel free to hit me up directly. My e-mail address is firstname.lastname (@microsoft.com).
Will RyuJIT support plugins for code generation in the llvm sense? //llvm.org/docs/Passes.html
I have sent a bug report to 'ryujit@microsoft.com' but I have got no answer back. Is the mail box still in use?
Is it possible to open source this like the rest of the related products which were open sourced.
Will there be plans to expand support to Windows 7?