RyuJIT CTP3: How to use SIMD

SIMD details will be forthcoming from the .NET blog, but Soma's already showed some sample code, and everything's available, so until those details are available here are directions about how to kick the SIMD tires:

  1. Go get RyuJIT CTP3 and install it (requires x64 Windows 8.1 or Window Server 2012R2, same as before)
  2. Set the "use RyuJIT" environment variable: set COMPLUS_AltJit=*
    • Update: make sure you un-set this when you are done testing RyuJIT! See here for more details.
  3. Now pay attention, because things diverge here a bit:
  4. Set a new (and temporary) “enable SIMD stuff” environment variable: set COMPLUS_FeatureSIMD=1
  5. Add the Microsoft.Bcl.Simd NuGet package to your project (you must select “include Prerelease” or use the –Pre option)
  6. Tricky thing necessary until RyuJIT is final: Add a reference to Microsoft.Numerics.Vectors.Vector<T> to a class constructor that will be invoked BEFORE your methods that use the new Vector types. I’d suggest just putting it in your program’s entry class’s constructor. It must occur in the class constructor, not the instance constructor.
  7. Make sure your application is actually running on x64. If you don't see protojit.dll loaded in your process (tasklist /M protojit.dll) then you've missed something here.

You can take a gander at some sample code sitting over here. It's pretty well commented, at least the VectorFloat.cs file is in the Mandelbrot demo. I spent almost as much time making sure comments were good as writing those 24 slightly different implementations of the Mandelbrot calculation. Take a look at the Microsoft.Numerics.Vectors namespace: there some stuff in Vector<T>, some stuff in VectorMath, and some stuff in plain Vector. And there are also "concrete types" for 2, 3, and 4 element floats.

One quick detail, for those of you that are trying this stuff out immediately: our plan (and we've already prototyped this) is to have Vector<T> automatically use AVX SIMD types on hardware where the performance of AVX code should be better than SSE2. Doing that properly, however, requires some changes to the .NET runtime. We're only able to support SSE2 for the CTP release because of that restriction (and time wasn't really on our side, either).

I’ll try to answer questions down below. I know this doesn’t have much detail. We’ve been really busy trying to get CTP3 out the door (it was published about 45 seconds before Jay’s slide went up). Many more details are coming. One final piece of info: SIMD isn’t the only thing we’ve been doing over the past few weeks. We also improved our benchmarks a little bit:

The gray bars are standard deviation, so my (really horrible) understanding of statistics means that delta's that are within the gray bars are pretty much meaningless. I'm not sure if we really did regress mono-binarytrees or not, but I'm guessing probably not. I do know that we improved a couple things that should help mono-fasta and mono-knucleotide, but the mono-chameneos-redux delta was unexpected.

Anyway, I hope this helps a little. I'm headed home for the evening, as my adrenaline buzz from trying to get everything done in time for Jay's talk is fading fast, and I want to watch Habib from the comfort of my couch, rather than be in traffic. Okay, I'm exaggerating: my commute is generally about 10 minutes, but I'm tired, and whiny, and there are wolves after me.

-Kev