New book: CLR via C#, Third Edition

9780735627048f Hey, everybody: Jeffrey Richter’s CLR via C#, Third Edition, is indeed now available! You can order it here or here (and lots of other places too, of course).

Today we’d like to share an excerpt from the book. We'll continue excerpting this chapter over the coming weeks and months. Enjoy.

Chapter 16
Arrays

In this chapter:
Initializing Array Elements . 388
Casting Arrays . 390
All Arrays Are Implicitly Derived from System.Array . 392
All Arrays Implicitly Implement IEnumerable, ICollection, and IList . 393
Passing and Returning Arrays . 394
Creating Non-Zero–Lower Bound Arrays . 395
Array Access Performance . 396
Unsafe Array Access and Fixed-Size Array . 401

Arrays are mechanisms that allow you to treat several items as a single collection. The
Microsoft .NET common language runtime (CLR) supports single-dimensional arrays, multidimensional
arrays, and jagged arrays (that is, arrays of arrays). All array types are implicitly
derived from the System.Array abstract class, which itself is derived from System.Object.
This means that arrays are always reference types that are allocated on the managed heap
and that your application’s variable or field contains a reference to the array and not the
elements of the array itself. The following code makes this clearer:

Int32[] myIntegers; // Declares a reference to an array
myIntegers = new Int32[100]; // Creates an array of 100 Int32s

On the first line, myIntegers is a variable that’s capable of pointing to a single-dimensional
array of Int32s. Initially, myIntegers will be set to null because I haven’t allocated an
an array. The second line allocates an array of 50 Control references; all of these references
are initialized to null. Because Control is a reference type, creating the array creates only a
bunch of references; the actual objects aren’t created at this time. The address of this memory
block is returned and saved in the variable myControls.

Figure 16-1 shows how arrays of value types and arrays of reference types look in the
managed heap.

image

In the figure, the Controls array shows the result after the following lines have executed:

myControls[1] = new Button();
myControls[2] = new TextBox();
myControls[3] = myControls[2]; // Two elements refer to the same object.
myControls[46] = new DataGrid();
myControls[48] = new ComboBox();
myControls[49] = new Button();

Common Language Specification (CLS) compliance requires all arrays to be zero-based. This
allows a method written in C# to create an array and pass the array’s reference to code written
in another language, such as Microsoft Visual Basic .NET. In addition, because zero-based
arrays are, by far, the most common arrays, Microsoft has spent a lot of time optimizing their
performance. However, the CLR does support non-zero–based arrays even though their use
is discouraged. For those of you who don’t care about a slight performance penalty or cross-
language portability, I’ll demonstrate how to create and use non-zero–based arrays later in
this chapter.

Notice in Figure 16-1 that each array has some additional overhead information associated
with it. This information contains the rank of the array (number of dimensions), the lower
bounds for each dimension of the array (almost always 0), and the length of each dimension.
The overhead also contains the array’s element type. I’ll mention the methods that allow you
to query this overhead information later in this chapter.

So far, I’ve shown examples demonstrating how to create single-dimensional arrays. When
possible, you should stick with single-dimensional, zero-based arrays, sometimes referred
to as SZ arrays, or vectors. Vectors give the best performance because you can use specific
Intermediate Language (IL) instructions—such as newarr, ldelem, ldelema, ldlen, and
stelem—to manipulate them. However, if you prefer to work with multi-dimensional arrays,
you can. Here are some examples of multi-dimensional arrays:

// Create a two-dimensional array of Doubles.
Double[,] myDoubles = new Double[10, 20];
// Create a three-dimensional array of String references.
String[,,] myStrings = new String[5, 3, 10];

The CLR also supports jagged arrays, which are arrays of arrays. Zero-based, singledimensional
jagged arrays have the same performance as normal vectors. However, accessing
the elements of a jagged array means that two or more array accesses must occur. Here are
some examples of how to create an array of polygons with each polygon consisting of an
array of Point instances:

// Create a single-dimensional array of Point arrays.
Point[][] myPolygons = new Point[3][];
// myPolygons[0] refers to an array of 10 Point instances.
myPolygons[0] = new Point[10];
// myPolygons[1] refers to an array of 20 Point instances.
myPolygons[1] = new Point[20];
// myPolygons[2] refers to an array of 30 Point instances.
myPolygons[2] = new Point[30];
// Display the Points in the first polygon.
for (Int32 x = 0; x < myPolygons[0].Length; x++)
Console.WriteLine(myPolygons[0][x]);

Note The CLR verifies that an index into an array is valid. In other words, you can’t create an
array with 100 elements in it (numbered 0 through 99) and then try to access the element at
index –5 or 100. Doing so will cause a System.IndexOutOfRangeException to be thrown.
Allowing access to memory outside the range of an array would be a breach of type safety and
a potential security hole, and the CLR doesn’t allow verifiable code to do this. Usually, the performance
degradation associated with index checking is insubstantial because the just-in-time
(JIT) compiler normally checks array bounds once before a loop executes instead of at each loop
iteration. However, if you’re still concerned about the performance hit of the CLR’s index checks,
you can use unsafe code in C# to access the array. The “Array Access Performance” section later
in this chapter demonstrates how to do this.

Initializing Array Elements

In the previous section, I showed how to create an array object and then I showed how to
initialize the elements of the array. C# offers syntax that allows you to do these two operations
in one statement. For example:

String[] names = new String[] { "Aidan", "Grant" };

The comma-separated set of tokens contained within the braces is called an array initializer.
Each token can be an arbitrarily complex expression or, in the case of a multi-dimensional array,
a nested array initializer. In the example above, I used just two simple String expressions.
If you are declaring a local variable in a method to refer to the initialized array, then you can
use C#’s implicitly typed local variable (var) feature to simplify the code a little:

// Using C#’s implicitly typed local variable feature:
var names = new String[] { "Aidan", "Grant" };

Here, the compiler is inferring that the names local variable should be of the String[] type
since that is the type of the expression on the right of the assignment operator (=).
You can use C#’s implicitly typed array feature to have the compiler infer the type of the
array’s elements. Notice the line below has no type specified between new and []:

// Using C#’s implicitly typed local variable and implicitly typed array features:
var names = new[] { "Aidan", "Grant", null };

In the line above, the compiler examines the types of the expressions being used inside the
array to initialize the array’s elements, and the compiler chooses the closest base class that
all the elements have in common to determine the type of the array. In this example, the
compiler sees two Strings and null. Since null is implicitly castable to any reference type
(including String), the compiler infers that it should be creating and initializing an array of
String references.

If you had this code,

// Using C#’s implicitly typed local variable & implicitly typed array features: (error)
var names = new[] { "Aidan", "Grant", 123 };

the compiler would issue the message "error CS0826: No best type found for
implicitly-typed array." This is because the base type in common between the two
Strings and the Int32 is Object, which would mean that the compiler would have to create
an array of Object references and then box the 123 and have the last array element refer to
a boxed Int32 with a value of 123. The C# compiler team thinks that boxing array elements
is too heavy-handed for the compiler to do for you implicitly, and that is why the compiler
issues the error.

As an added syntactical bonus when initializing an array, you can write the following:

String[] names = { "Aidan", "Grant" };

Notice that on the right of the assignment operator (=), only the array initializer expression is
given with no new, no type, and no []s. This syntax is nice, but unfortunately, the C#
compiler does not allow you to use implicitly typed local variables with this syntax:

// This is a local variable now (error)
var names = { "Aidan", "Grant" };

If you try to compile the line of code above, the compiler issues two messages: "error
CS0820: Cannot initialize an implicitly-typed local variable with an array
initializer" and "error CS0622: Can only use array initializer expressions to
assign to array types. Try using a new expression instead." While the compiler
could make this work, the C# team thought that the compiler would be doing too much for
you here. It would be inferring the type of the array, new’ing the array, initializing the array,
and inferring the type of the local variable, too.

The last thing I’d like to show you is how to use implicitly typed arrays with anonymous types
and implicitly typed local variables. Anonymous types and how type identity applies to them
are discussed in Chapter 10, “Properties.” Examine the code below:

// Using C#’s implicitly typed local, implicitly typed array, and anonymous type features:
var kids = new[] {new { Name="Aidan" }, new { Name="Grant" }};
// Sample usage (with another implicitly typed local variable):
foreach (var kid in kids)
Console.WriteLine(kid.Name);

In this example, I am using an array initializer that has two expressions for the array elements.
Each expression represents an anonymous type (since no type name is specified after the new
operator). Since the two anonymous types have the identical structure (one field called Name
of type String), the compiler knows that these two objects are of the exact same type. Now,
I use C#’s implicitly typed array feature (no type specified between the new and the []s) so
that the compiler will infer the type of the array itself, construct this array object, and initialize
its references to the two instances of the one anonymous type.1 Finally, a reference to this
array object is assigned to the kids local variable, the type of which is inferred by the compiler
due to C#’s implicitly typed local variable feature.

I show the foreach loop as an example of how to use this array that was just created and initialized
with the two anonymous type objects. I have to use an implicitly typed local variable
(kid) for the loop, too. When I run this code, I get the following output:

Aidan
Grant