Hello, World! and a couple of the things I like about C# (2.0)

Who are you?

I'm a developer in the Visual Studio group, currently working on Visual Studio 2005 Team System.  More specifically, stealing Buck's description: I am a developer working on the Burton source control system (Team Foundation), which we code-named Hatteras.

We both work in Brian Harry's group (yes, that Brian Harry) here in lovely North Carolina.  Well, Brian spends a lot of time in Redmond, of course, but that's getting off-topic.

There's a lot to write about source control in the days/weeks/months to come, but I wanted to start with something intentionally off-topic, just because I love coding in C# 2.0.  Oh, and I'm ripping off the “Hello, World“ approach from Soma who heads up the Developer Division that I work in :)

So what are a couple of the things you like about C# 2.0?

Note that my love of C# 2.0 knows no bounds, but for the sake of at least attempted brevity I'm picking a couple of items to focus on as part of this initial blog post :)  Note that I'm not on the C# team like EricGu and others, just a fan of their work :)

Iterators

The first is what most gets called “Iterators” in articles about it.  That can be confusing for developers that may have experience with java.util.Iterator which is related in concept but is actually an entirely different beast.  The two are related enough to be confusing, unfortunately.  The Java Iterator is an object that enables iteration over an existing collection and as such is a very useful thing.  It's definitely a key part of the Java Collections Framework.

The C# iterator (IEnumerable/IEnumerator with generic versions IEnumerable/IEnumerator) allows you to do away with collections as a sharing mechanism if you want.  That can be a confusing statement, so let me explain: there are many instances where you want to set up pipelines of processing.  Effectively, you want the ability to create/manipulate arbitrary streams of objects.  There are certainly plenty of systems for setting up large scale / heavyweight / cross-machine pipelines (MSMQ, JMS, etc.) and those are incredibly useful and serve their purposes well.  However, in many scenarios you'd like to do pipelines “in the small“ - situations where today you're interacting with some data source to get a collection of objects, then passing those to something that mangles them either into the same collection or a new collection, lather, rinse, repeat, until you're getting fully-processed results at the end.

In such scenarios you're usually taking one of two approaches - you're doing one item at a time through the entire pipeline, or you're taking the N items through each stage of the pipeline.  The N items through each stage can have some advantages (should have better instruction cache behavior, for instance), but is lacking if your goals include things like either trying to reduce footprint (if the N can be large, say in the thousands or even millions category) or time-to-first-result.

This isn't to suggest that the C# iterator is limited to pipeline scenarios by any means, but it's incredibly useful for that.

int m_count = 25;

IEnumerable BufferedResults()

{

    while (true)

    {

        Result[] results = ExpensiveOperation(m_count);

        if (results.Length == 0) yield break;

        MangleResults(results);

        foreach (Result r in results) yield return r;

    }

}

 

Obviously this is contrived, but it gets across a few things: 1) yield break for when you're done (there's an implicit yield break if the method reaches the end) 2) yield return for each object in the “object stream” you're creating 3) we can easily dynamically adjust the buffering if we want (it's just a instance variable in this case, of course) 4) we can operate on as small or as big of chunks as we like, our consumers will see a steady (logically, not necessarily temporally) stream of Result objects.  We have fully decoupled our buffering from our consumer.

 

Yes, BufferedStream and the like have given us the same for bytes / chars / etc. but now we have language syntax that allows us to do it easily for arbitrary types.

 

The one big gotcha is the “how is a consumer supposed to know the types of objects in the stream?“ but that leads into our next section nicely :)

Generics

Much has been (and will be) written about generics.  The fact that the CLR in 2.0 treats generics as full run-time first-class types does make a lot of things possible that aren't for languages that don't have them as run-time entities (things like performance gains seen from Stack<int>/List<int> and friends not needing to box/unbox all the time, because they actually store the int's (value types in general) and not boxed versions of them), but the most common gain for developers from generics will be type-safety of collections.  While this is helpful for one-level collections (List<int>, Dictionary<int, Foo>), it's IMHO more so for more nested collections (List<List<int>> or Dictionary<int, List<int>> for instance). 

 

If you're in the situation of consuming a complex collection from a random method that you didn't write, having that returned collection be generic makes your development life easier - both because it's more obvious what the actual data types are (and you don't have to trust comments which might get out of sync with the code), but also because you know that if you mess up your code consuming the collection, there's a good chance it'll be caught at compile time instead of run-time.

 

I'm a huge fan of catching problems at compile time instead of run time.  I'm a huge fan of TDD (Test-Driven Development) as well, but compile-time checks can be so much nicer IMHO since they're compiler-enforced (of course) so I never have to worry about whether there's a test covering my consumption of the collection.

In any case, a quick example:

static void Main(string[] args)

{

    foreach (int i in Range(3, 5))

        Console.WriteLine("Got an int = " + i);

    foreach (string i in Range(3, 5))

        Console.WriteLine("Got a string = " + i);

}

static IEnumerable Range(int start, int stop)

{

    for (int i = start; i <= stop; i++)

        yield return i;

}

 

The above code compiles just fine.  We've defined a Range() method that lets us have an arbitrary range as a “stream“ (my term) of ints - no big collection of integers needed in-memory at all, and I can treat the results as a stream.  However, as you can see from the above (the fact that it compiles), I'm missing a fundamental type safety check - I'll get a run-time InvalidCastException as we attempt to take the int and cast it to a string in our second foreach loop.

 

Generics to the rescue!

 

I can change my Range method to this...

 

static IEnumerable<int> Range(int start, int stop)

{

    for (int i = start; i <= stop; i++)

        yield return i;

}

 

... and now my second foreach loop gives a compiler error!

 

In any case - “Hello, World!” and yes, I'll probably be speaking much more about source control than my favorite C# features in the future, although I'm not making any promises :)