Collections too slow? When to write your own basic types

As usual there isn’t really one set of rules that will always guide you to making the right decisions but I’d like to offer some approximately correct guidance to help you to decide when you should decline the standard menu and instead cook up your own basic types.

Rule #1: Don’t do it

Seriously, who needs the aggravation?  If you create your own basic types (a collection, an event source, an enumerator, a string buffer, etc.) you’ll automatically start at a penalty.  The built-in class is bound to be used in your application, either directly or indirectly. Even if you could substitute your class for the built-in class it’s likely that the new class doesn’t do everything the standard version does – in fact that may be the very reason you created your own class.  So now your program will have to load (at least) two classes that do approximately the same job.  Likewise your developers will have to keep (at least) two classes in their head and know how to choose between them.  And of course you’ll have to service the extra class yourself – no handy security fixes from your pals at Microsoft in the event of boo-boos, you’ll have to do your own patches. You do have a site like windowsupdate.com in your release plan right?  No?  Hmmm… you might be needing one.  Lastly, and perhaps most compellingly, it just doesn’t scale – if everyone creates their own version of some basic class that’s approximately but not quite the same as the one offered by default then the total redundancy that end users (like my mom) get in their applications will be staggering.

So, seriously, just don’t. 

OK, you still really want to?  Or better yet you really don’t want to but you’re pretty sure you have to.  Here are some of the more common reasons why it becomes necessary.

Rule #2: Do it if data volume demands specialization

Our base classes are designed to be broadly applicable and so they provide a broad spectrum of services and a nice high quality of service.  However this can work against you:  Suppose your application needs say a million hash tables…  Yikes!  Do you really want our full service hash tables for that?  Maybe not.  How many entries in these hash tables?  If there’s a million of them maybe not too many.  What’s the overhead per hash table going to be?  There could be very big savings on the table for a specialized class.  Maybe enough savings to let you target a lower class of consumer hardware and still be functional.  That kind of flexibility might be enough to justify the extra work.

Rule #3: Do it if call volume demands specialization

Again, our base classes are designed to be easy to use in a variety of situations and provide robust error reporting.  However if (e.g.) collections are critical to your performance scenario and your call volume (number of calls) is such that every cycle counts you may need to have your own specialized collection.  I’m picking on the collections here because everyone knows them, but the same phenomenon can happen in pretty much any low-level class.  Sometimes the cost of the nice parameter checking (and we’d be slammed if we didn’t do good parameter checking) is comparable to the whole job at hand.  Sometimes the cost of polymorphism (the virtual function calls) for plugability is comparable to the cost of the full job of the API.  Consider poor ArrayList – we need to do argument validation on even the simple methods or else developers will be faced with very difficult to decode exceptions internal to the class.  So we put in extra checks to help developers get rich messages and be more productive.  However, if an ArrayList collection was the cornerstone of your application, creating a specialized class might be just what you need meet throughput goals on more modest hardware (and hence win valuable contracts).

Rule #4: Do it if exotic threading disciplines are required for scale

Many of our lowest level classes offer little guarantee in the way of multi-threaded services.  We tend to write them so that you provide your own locking mechanism and we stay the heck out of the locking business.  That’s a good position to take in my opinion because low level classes like hashtables don’t really understand the context in which they are being used, so they would tend to take locks at the wrong granularity to be useful if you try to make them natively multithreaded.  This is all fine and well but if the objects are fundamentally single threaded (and hence need to be protected by locks) but you can’t afford to take those locks (because it would kill your scale-up on a multi-processor machines) then you might need a specialized collection, enumerator, or whatever, that is multi-processor aware and (very carefully!) uses volatile storage and maybe interlocked operations to provide some useful multi-threaded support in a lock-free manner.  This might be what it takes to get good value for the dollar on (e.g.) an 8, or 32 processor box. 

Rule #5: Do it if strict control over memory or exceptions is critical

The final case is probably the rarest by far.  However there are cases where users are trying to create classes that will be used in a near-real-time scenario (such as video playback).  In that case you might want to use only a set of classes that carefully control the lifetime of the objects allocated (e.g. via recycling) so as to have garbage collections happen in a much more predictable fashion.  This is not an easy undertaking but it is possible and even important in some cases where managed code is being used in (e.g.) a kiosk.

Rule #6: Report performance issues in the regular classes whenever you can

Take advantage of https://lab.msdn.microsoft.com/ProductFeedback/ (aka Ladybug) to give us a wake-up call if some area of the framework is letting you down.  This is a great way to influence the team's priorities and believe me we really want to do the most important stuff first.  It's also a great way to share workarounds and warnings when appropriate.

Summary: Have careful measurements and clear verifiable goals for success before proceeding

All of the cases I’ve discussed identified a performance problem that was both acute and relevant to the overall success of the product.  Before taking the cost of re-implementing a low level type you’ll want to quantify the costs and benefits of adopting your own and then verify that you are getting those benefits.  E.g. (fictitious example) “Simulations indicated that a lock-free multi-reader version of the hashtable would yield 25% more throughput on 8-way processors and 50% more on 32-way processors for our product.  These gains should result in an extra $1M revenue in the next fiscal year which justifies the expected $100,000 in extra support costs.”

You might start with ideas like “Hmm, I think I could do this with less memory if I roll my own” but I strongly caution you to not stop there.  Have strong data to back your position or chances are you’ll get nothing but aggravation.

Remember, if you are delivering a small component into someone else’s larger system, you probably shouldn’t build your own low level implementations.  But if you are a medium or large team delivering a complete subsystem or application, and you can devote real bodies to the problem, then you might be able to justify the investment.