How much power is too much?

In the second section of the Channel 9 tour videos, Rico Mariani talks about performance and I once again butcher his last name (I once made the same mistake at a BillG review, sorry man <sigh>; it’s pronounced “Mary-annie”, got it now <g>).

One thing we’ve seen a lot is that it is really easy to take one or two lines of managed code and really blow your working set and startup time.  It’s a double edged sword.  On the one hand, you get a really easy to use class library, on the other you don’t feel the pain that would otherwise make you think twice about using an expensive feature.

Two concrete examples include the XML serializer and compiled Regex expressions.  I’ve seen requests for Assembly.Unload() that really are due to having these extra dynamically generated assemblies around.  This performance report on the internal Headtrax application explains how that application invoked the C# compiler 16 times at startup!  (this is discussed in the interview section with Scoble coming up in a new segment).  If your application is slow to start, see if you are making this very easy mistake.

We’ve kicked around a bunch of ideas on this in the past.  Obviously our first goal is that you always get the fastest solution by default (you should precompile your XML serializer code for example).  We’ve toyed with giving you a “red/yellow/green” gradient in the docs or Intellisense that would hint at where you are using an expensive feature.  Profilers are also useful for you to figure this kind of stuff out; but do require work and planning.

So my question for you:  how much power is too much?  Should we be making the library much more “in your face” or harder to use for things that are really going to cost you?  What are your favorite “feature rich” API’s that might be causing this unexpected/unwanted overhead?

Comments (9)

  1. I think "how much power is too much" is the wrong question to ask. Brad Adams asked a similar question. The framing of this question implies that the problem is simply exposing powerful functionality to the programmer.

    Power is more than being able to compile a regex or serialize to xml. Power is being able to optimize a regex, and serialize to a human readable file without worrying about performance. Making things harder to use will not increase the fundamental usability problem — poor performance is an API usability problem, as it requires additional work and tuning. Making things harder to use increases the usability problem.

    It is nice to just say "that is too powerful". That pushes the responsibility back on the API user. I think the real solution is twofold:

    – "optimizers" such as regex compilation should not cause a different performance problem to be tuned — fix the compilation so memory leakage is not a problem.

    – the API user should not be guided toward unfixable problems to perform common operations, such as serialization. For example, the xml spec is huge, and xml is text based, so there is no way xml will have great performance. Reflection is also very slow. Put those together, and one can conclude that xml serialization using reflection should not be promoted in the first place. Perhaps a type of xml (a subset) and a type of Reflection (a much faster implementation) would work — but that is not what we are given, so it should not be promoted.

    If you provide a fast performing API with promotion of efficient techniques, the user will have a hard time doing the wrong thing, at least in a way that is hard to fix. It is still possible to abuse the tools, but less likely. I push the "power" responsibility back on the API developer.

  2. David Douglass says:

    First off, the developer has to understand what they’re doing to do a good job. No amount of cool IDE tricks (like color coded Intellisense) can get around that.

    Second, define “expensive”. Like quality, it’s relative. Code that runs continuously in a Windows service is not the same as code that runs once a year in the dead of night.

    Putting something in the docs sounds good, but how would that work if we’re not dealing with something quantifiable? Shall we say that method X is “quick” and method Y is “slow”?

    I think the best solution is an integrated tool that combines static code analysis and run time profiling. Based on Jason’s example, the tool would report out that compiled regex expressions are known to be potential performance problems and the run time profile for this application indicates that this is an area to look at.

  3. It was the XML serializer that caused the 16x compiler invocation. I would argue it is not the developers fault that they want to use any easy way to serialize data to a human readable file. The problem is that the XML Serailizer was released at all with this intrinsic performance problem.

    All the compilation could be done at design/compile time. At that point the structure of the serialized objects is known. So perhaps a better design would be:

    – an xml serializer that uses a fast form of reflection, and a sub-set of xml, and no compilation or

    – an xml serializer that does not use reflection, but generates the IL at design/compile time, rather than run-time.

    Either way the full xml spec is overkill for data serialization.

  4. Kevin Dente says:

    >What are your favorite "feature rich" API’s that might be causing this

    >unexpected/unwanted overhead?

    Also on the XMLSerializer front, how about the fact that some constructor overloads generate the serialization assembly and then don’t cache it. If you use that constructor form repeatedly, you essentially leak assemblies (since they never get unloaded). Totally undocumented, of course.

  5. Anonymous says:

    I agree with David that a developer has to know what he/she is doing in order to do a good job. But unfortunately, there are always some developers who are not as aware of what to watch out for (myself included). I’m working on a team where even my team lead isn’t aware of the performance cost of all the boxing/unboxing operations we are making. Unfortunately, my opinion was taken as a challenge to my team lead’s technical ability. So I kinda like the idea of giving hints at the IDE level. Note that they are just hints. At least I hope the hints (coming from Microsoft) will prompt the developer to look deeper into the API that is being used.

    Profilers and tools like FxCop are great too, but only if the developer uses it.

  6. Stephane Rodriguez says:

    In one of your videos, you kind of remind us the history of the CLR run-time, originating from COM+, and languages like C++, VB. I think your statements are unfair, and filled with the real agenda, which is to absolute vanish any of your public speaks with anything related to Java, the run-time, JIT and SDK.

    I find it absolutely odd that the MS Java VM 3.0 was their at that time, doing ****ABSOLUTELY EXACTLY**** what the CLR run-time, JIT, interop and base classes are doing these days, and you still want us to think that you wrote these things from scratch.

    All when even Helsdberg was working on Visual J# at that time, the IDE for the MS Java VM.

    A lot of gibberish IMHO.

  7. Stephane Rodriguez says:

    By the way, I have another question, one that my resonate why so little people have embraced .NET on the client (I believe .NET applications just cannot be generally deployed, but that’s another story).

    How come only apps developed at Redmond have been successfully able to host the CLR in an unmanaged app and do anything meaningful with it? Examples : ASP.NET, VisualStudio of course, SqlServer Yukon.

    So? Do you think that improving the design the of this plumbing and making it really usable from a developer perspective would improve anything, or is it good enough? Statistics don’t speak a lot for your camp, yet you might come with words telling how much successful .NET has been so far…

  8. Keith says:

    Where there is an particularly complicated API I like a two-pronged approach. Provide a simple API that covers the 80% set of use cases. Then provide a more flexible, feature rich, full API that covers all use cases. I don’t mind if the full API is a bit more complicated to use (unless it is arbitrarily complicated). It is also reasonable that the simple API sits on top of the full API. One of my favorite examples of this in the BCL is System.Net.Sockets.TcpClient/TcpListener (simple API) versus System.Net.Sockets.Socket (full API). Another example of this is the My classes feature of VB 2005.