CLR Team at the Atlanta C# User Group

The CLR team is speaking at the Atlanta C# User Group.  Tonight just simply rocked.

Kit George opened the show tonight by demonstrating the new GC.MemoryPressure API as well as the newly added TryParse method on value types.  And one of my favorite demos... Space Invaders written in managed code, in a console app... then he used a device to connect to a serial port and played the game in front of everyone.  Who knew that serial I/O was the #1 requested feature for the .NET FCL?

Brad Abrams then talked about CLR Internals.  You have got to love a talk where you start the talk opening notepad.exe and compiling command-line using csc.exe.  The next part of his talk was about when verification takes place, how the CLR loads an assembly, and how it fixes up the address for the x86 instruction during JIT compilation so that further evalations jump straight to the x86.  Another interesting point...

This routine will optimize out the bounds check because the CLR knows you are not going to walk off the end of the array... 

for(i=0;i<myArray.Length;i++) { Console.WriteLine(myArray[0].ToString()); }

This one will not:

for(i=0;i<myArray.Length - 6;i++) { Console.WriteLine(myArray[0].ToString()); }

My personal favorite (largely because it was info I have not seen before in a presentation or haven't presented myself...) was Claudio Caldato's presentation on performance.

Claudio Caldato wrapped up the evening with a discussion on performance.  Perf is not something you address once, perf is something you address continually.  This should be a mantra for developers.. "I will measure my code... I will measure my code..."  There was a great discussion on knowing what GC does and what causes something to be in gen 1 or 2, and why you should avoid gen 2.  use perfomon counters to monitor as well as CLRProfiler.

  • Null out object references to dead object graphs
  • Avoid implicit boxing
  • Avoid pinning young objects
  • Avoid GC.Collect... the GC is self-tuning.
  • Finalization... use the Dispose pattern. 
    • Implement IDisposable
    • call GC.SuppressFinalize
    • hold few obj fields
    • dispose early, when possible use "using" in C#

And my favorite... the CLRProfiler!

Had a loop that appends a string with an int 10,000 times.  Ran CLRProfiler.

Allocated bytes, it allocates 1.7 Gigs over its lifecycle.  It was moving 15 megs of memory, due to GC compacting memory.  Gen 0 collections had more than 3,000 gen 0 collections. 

Then used a StringBuilder...

21 Megs allocated.  213 K reallocated, and only 40 gen 0 collections.  The timeline between them showed much less thrash and memory went to nearly zero. 

Ran profiler on 100,000 iterations with a SolidBrush (which includes a finalizer).  Allocated, 8 meg, realllocation, 3 meg.  TimeLine shows that finalized objects are moved into the finalize queue and memory increases until collection occurs.  Changed this to use a C# using statement... 8 meg, 10K reallocated... timeline shows the triangular shape that is desirable rather than the "step" shape that you see without the "using" statement.

Reflection:

  • Fast and light:
    • TypeOf, object.GetType, get_Module, get_MemberType, new Token/Handle resolution APIs
  • Costly:
    • MemberInfo, MethodInfo, FieldInfo, GetCustomAttribute, InvokeMember, Invoke, get_Name
    • Avoid using case insensitive member lookups
  • Only request what you need
  • Consider using the new Token/Handle resolution APIs.  For every method, you can get a unique token to resolve that member.  Every access to that member via that token is 3-4 times cheaper than using MemberInfo or MethodInfo directly
  • FxCop will check for performance potential issues
  • VB.NET developers should use Option Explicit On and Option Strict On, avoid late binding

P/Invoke, COM interop cost

  • Efficient, but frequent calls add up
  • Costs depend on marshaling
  • Diagnosis
    • perfmon: .NET CLR interop counters
    • time based profilers (VSTS, NuMega)
  • Mitigate interop call costs by batching calls or move the boundary

Deployment

  • Assemblies
    • The fewer the better due to security, if you are severely focused on perf.  This is at the cost of the security boundary for multiple assemblies.
    • The perf hit is only at load time for the assembly
  • Use GAC
    • Avoids repetitive SN signature verification
  • Use NGEN
    • Caches pre-JIT'd DLL (but with caution, your code might run slower because locality of code may be further, does not do optimizations that JIT is able to do)
    • Must measure this for yourself
    • This goes against most of the advice that you will hear about from folks in the field, but the consistent message is that you should measure this for yourself to see if you are in one of the cases where NGEN can provide performance enhancements.

XML

  • Avoid System.Xml.dll (it is 2 MB) unless you need it
    • Remember that it may be deployed as a referenced assembly
    • Will not be in the manifest for the assembly and thus not loaded into memory unless you call a method in type in that assembly, causing the JIT
  • Don't use XML classes for trivial tasks, such as configuration files that hold simple values. 
  • The CLR does parsing of machine.config and other config files in mscorlib.dll and not System.Xml.dll for its configuration tasks to optimize this. 

And then the icing on the cake...

In Visual Studio 2005, use Performance Tools wizard and set up a performance session using instrumentation or sampling.  It collects data, allows you to see the same type of information that CLRProfiler shows, and you can drill down into the methods to do more investigation.