Arrays of arrays


Most people understand that there’s a difference between a “rectangular” and a “ragged” two-dimensional array.

int[,] rectangle = {
  {10, 20},
  {30, 40},
  {50, 60} };
int[][] ragged = {
  new[] {10},
  new[] {20, 30},
  new[] {40, 50, 60} };

Here we have a two-dimensional array with six elements, arranged in three rows of two elements each. And we have a one-dimensional array with three elements, where each element is itself a one-dimensional array with one, two or three elements.

That’s all very straightforward. Where things get brain-destroying is when you try to make a ragged array of two-dimensional arrays.  Quick, don’t think, just answer.

int[,][] crazy;

What is the type of crazy?

Option one: It’s a one-dimensional array, each element is a two-dimensional array of ints.
Option two: It’s a two-dimensional array, each element is a one-dimensional array of ints.

OK, now that you have your snap answer, think about it carefully. Does your answer change?

I’m not going to tell you the answer just yet. Instead let’s explore the consequences of each possibility.

Consequence One

Surely the way you make any type into an array is to append [] to the type specification, right? But in option two, you stick the [,] into the middle.

Option two is weird. Option one is sensible. 

But wait. If [,][] means “a 1-d of 2-d’s”, then the order you read it off the page opposes the order you say it — it looks like two-d-thing-followed-by-one-d-thing, so why shouldn’t it read “a 2-d of 1-d’s”?

But then why does the “int” come first? By that logic it should come last.

Argh. Maybe option one isn’t entirely sensible. Clearly something is not quite perfect with both options. Oh well. Let’s move on.

Consequence Two

Now suppose that you wanted to obtain a value in that array, assuming that it had been initialized correctly to have plenty of elements everywhere we need them. How would you do it?

Suppose we’re in option one. It’s a one-d array. Therefore crazy[10] is a two-d array. Therefore crazy[10][1, 2] is an int.

Suppose we’re in option two. It’s is a two-d array. Therefore crazy[1,2] is a one-d array. Therefore crazy[1,2][10] is an int.

Option one is weird — crazy is of type int[,][] but you dereference it as [10][1,2]! Whereas option two is sensible; the dereferencing syntax exactly matches the ordering of the type name syntax.

Consequence Three

Now suppose you want to initialize the “outer” array but are going to fill in the “ragged” interior arrays later. You’ll just keep them set to null at first. What’s the appropriate syntax to initialize the outer array?

Suppose we’re in option one. It’s a one-d array. Therefore it should be initialized as crazy = new int[,][20]; right?

Suppose we’re in option two. It’s a two-d array. Therefore it should be initialized as crazy = new int[][4,5]; right?

Option two is weird. We said int[,][] but initialized it as [][4,5]. Option one is sensible.

What C# actually does

It’s a mess. No matter which option we choose, something ends up not matching our intuition. Here’s what we actually chose in C#.

First off: option two is correct. We force you to live with the weirdness entailed by Consequence One; you do not actually make an element type into an array of that type by appending an array specifier. You make it into an array type by prepending the specifier to the list of existing array specifiers. Crazy but true.

This prevents you from having to live with any weirdness from Consequence Two; in this option, the dereferencing happens with the same lexical structure as the declaration.

What about Consequence Three? This one is the real mind-bender. Neither choice I offered you is correct; apparently I’m a sneaky guy. The correct way to initialize such an array in C# is:

crazy = new int[4,5][];

This is very surprising to people!

The design principle here is that users expect the lexical structure of the brackets and commas to be consistent across declaration, initialization and dereferencing. Option two is the best way to ensure that if declaration has the shape [,][] then the initialization also has that shape, and so does the dereferencing.

That all said, multidimensional ragged arrays are almost certainly a bad code smell. Hopefully you will never, ever have to use your new knowledge of these rules in a production environment.

Life is much better if you can instead use generic collections. It is completely clear what List<int[,]> means – that’s a list of two-dimensional arrays. Whereas List<int>[,] means a two-d array of lists of int.

Comments (48)

  1. Oh my goodness, that is very weird indeed!

    I would have expected it to behave as you outlined in option 1.  I’ve never had to use multidimensional ragged arrays in C#, so I haven’t ran across this "problem".  

    Was your holiday good?  Are you finished with the pre-recorded posts now?  Did you see any Beaver-Sharks??? 🙂

  2. Thomas Levesque says:

    Enlightening, as always… thanks !

  3. Pavel Minaev [MSFT] says:

    It’s interesting to note that this isn’t a problem for pointers, because you cannot have a pointer to an array. So all the weird cases there are disabled anyway:

       int*[] a;  // array of pointers

       int[]* a;  // error

       int[]*[];  // error

       int*[]*;   // error

    Anyway, the array declarator problem is something that always bugged me as awfully inconsistent, but then you can only do so much with the legacy C/C++/Java syntax. It’s interesting to see how other languages from the family deal with the problem.

    Java simply weasels out. It doesn’t have multidimensional arrays, nor pointers, so the only thing you can stick the [] declarator on is the base type, or another array declaration. Obviously, when they’re all 1-dimensional, it doesn’t matter which side you read it from, left or right – it’s all the same:

       int[][] a;

    D 1.0 has much more to deal with, since it has static arrays a la C (with size specifier), and also pointers to arrays etc. This can quickly become a mess, so to retain some consistency, they go for "option 2", and read type modifiers consistently from right to left. So:

      int[3][4];       // array of 4 arrays of 3 ints

      int*[]*[3] a;  // array of 3 pointers to dynamic arrays of pointers to int

    Which is of course inconsistent with usage, since the order is reversed:

      *((*(a[1]))[2]) = 123;

    To mitigate that somewhat, D 1.0 also gives the option of declaring arrays C-style, with brackets following variable. For those declarations, the order is left-to-right, and thus consistent with C/C++:

      int a[4][3];       // array of 4 arrays of 3 ints

      int* (*a[3])[];  // array of 3 pointers to dynamic arrays of pointers to int

    This is somewhat consistent in that if you want the declaration to match usage, you have to go all the way, and match the placement of [] in the declaration with usage as well (i.e. after the variable). But then two subtly different ways to do the same thing are a recipe for bugs, and D language reference itself recommends always sticking to the first, prefix form.

    Personally, I’ve always found this to be the area where Pascal-derived languages shine – especially Modula-2. You can’t beat the clarity of:

      VAR a : ARRAY[1..3] OF POINTER TO ARRAY[1..10, 1..10] OF INTEGER;

    Pascal/Delphi are somewhat less readable (but more concise) when it comes to pointers, but the general idea is the same: types should _always_ be read left to right in a natural way. Of course, if you do this, you want the type to be on the right of variable name in a declaration, otherwise it just looks weird. I mean, say we had:

       [,][]int a;  // huh?

    I believe D 2.0 explored that at some point, but then backed away to the same behaviour as D 1.0. I can understand why – I’ll take the existing C# syntax any day, thank you very much…

    By the way, it seems that C++0x might get Pascal-style declaration syntax as well. In the minutes of July 2009 WG21 meeting (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2920.html), there is this interesting bit:

    "Crowl, Stroustrup, and Vandevoorde will work on a Linear Declaration Syntax that is an alternative to and generalization of Unified Function Syntax. Some examples:

    [] p1: *const int; // Pointer to const int

    [] p2: *int const; // Error

    [] p3: (char)->*int; // Function talking a char returning pointer to int

    [] p4: *()->int; // Pointer to function returning an int

    [] p5: ()->*()->int; // Function that might return p4

    [] p6: [2][3] * [4] int;// Array of 2 arrays of 3 pointers to arrays of 4 ints"

    Of course, for C++ this is really long overdue because of the complexity of its type system, and the fact that declaring something like a pointer to array is a fairly common thing. In contrast, I do not remember the last time I had to write a multidimensional array in C#…

  4. DRBlaise says:

    I have avoided multi-dimensional arrays for years now.  They are weaker than "ragged" arrays in most ways.  The most important to me is speed – getting or setting rectangle[1,2] is about twice as slow as ragged[1][2].

  5. CompuDrummer says:

    That’s enough to drive anyone crazy.  And people wonder why it’s so hard to understand code written by someone else.

  6. Dan says:

    The other fun part is that System.Type follows Option 1. So

       typeof(System.Int32[,][]).Name =="System.Int32[][,]"

    Someone recently filed a bug on the XAML parser because of this (we write out the type name using System.Type.Name), I had some fun explaining why it was By Design.

  7. Greg says:

    Avoid ragged arrays and multi-dimensional ones with 3 or more subscripts.   This ‘too smart for your own good’ code has bitten me several times usually at the expense of multple days of debuger fun.

  8. Craig Stuntz says:

    OK, I completely understand why C# language implementers care about this, so props to you for posting it. But for anyone else, the answer is, "It doesn’t matter; just don’t do it!"

  9. Alex O. says:

    Actually, I didn’t find this weird or mind-twisting – option 2 makes perfect sense.

    If we mentally visualize the one-dim ragged array as a histogram chart, where the ragged array size represents the bar height, then the int [,][] crazy is just like one of those really annoying isometric 3-D histograms with square bars arranged in a grid of N x M rows and columns and still each ragged array represents different bar heights.

  10. commongenius says:

    @DRBlaise,

    Getting an element from a ragged array takes two accesses, while getting an element from a multi-dimensional array only takes one access. So multi-dimensional arrays should usually be faster, not slower.

  11. Ben says:

    One to file under "things I hope I never have to worry about"

  12. Denis says:

    Devil is in the detail, huh? 🙂

    It’s all well and good for us who USE the language to say "just don’t do it", but the problem is, those who DESIGN the language cannot afford to overlook ANY possible syntactically valid use of the language they design, no matter how silly and impractical it looks (and I am NOT saying that arrays of arrays are silly and impractical: all I am saying is that even if they were, they still must work, full stop).

    Which brings us to an interesting question: how do you unit-test a language? Is it possible to produce the sufficient amount of code equivalents of human-language tongue-twisters to make sure the language is complete? How do you catch the programming-language equivalents of the English phrase "we eat what we can and what we can’t eat, we can," where you’re not sure what they said: that "what they couldn’t eat, they could," or that "what they couldn’t eat, they canned?" Ok, we humans can intuitively figure out these things, but computers don’t have any intuition yet…

  13. Pavel Minaev [MSFT] says:

    @commongenius,

    Unfortunately, it doesn’t work that way. 0-based 1D arrays have special IL opcodes to instantiate and access elements, and those are efficiently JIT-compiled, with all those nice optimizations such as eliding unnecessary bound checking in a loop. For multidimensional arrays, element access is implemented via method calls – just check the IL. And those calls are slow.

    That kills perf on the spot. All benefit you get from locality is lost from method access overhead – a 2D array is roughly 1.5 times slower than a corresponding jagged array because of that.

  14. tb says:

    What’s wrong with int[][][] sane; ?

  15. Chris says:

    @Pavel Minaev

    You are correct in saying that there are no special IL opcodes for multi-dimensional arrays, but that’s no reason for it to be slow.

    The JITter could produce special inline native code for method calls to the Array object that get/set multi-dimensional arrays.

    The problem (so far as I know, and I could easily be out of date) is that the current JIT makes no attempt to do this. Theoretically this should be faster than jagged array access.

  16. Pop.Catalin says:

    Talking about driving crazy,I love playing with arrays:

           int[,][][][,,] arr = new int[10,12][][][,,];

           arr[3, 5] = new int[6][][,,];

           arr[3, 5][1] = new int[3][,,];

           arr[3, 5][1][0] = new int[3, 2, 4];

           arr[3, 5][1][0][0, 1,0] = 2;

           System.Console.WriteLine(arr[3, 5][1][0][0, 1,0]);

    🙂

    But on second though, thank god for generics.

  17. laci says:

    > crazy = new int[4,5][];

    > This is very surprising to people!

    Funny; that was my first guess. 🙂

    That means, you did your designing job right [again]. 🙂

  18. Alex says:

    My C++ brain was horrified at what option 2 would mean for code like this:

    #define someType int[,]

    someType[] crazy;

    But I suppose that’s part of the reason C# won’t let you use #define like that.

  19. My C++ brain is horrified at the thought that someone would even consider using #define (rather than typedef) for type aliases!

  20. Greg says:

    This would not pass our code reviews since arrays beyond 2 dimensions or jagged arrays should be used only in very special circumstances (e.g., representing 3d data or a series of frames for 2d data).

    Eric,

    An interesting exmple of non-uniform array dimensions in how SQL Server 2000 stored variable length rows in a single data block.  Throw in inter-page data compression with SQL 2008 and it gets more interesting.

  21. TheCPUWizard says:

    @Greg,

    I am somewhat suprised at your "policy" in this matter.

    Of course there is always the design decision between an array and an indexable (or keyed) collection. When using collections, it is not uncommon to have significantly more than 3 dimensions, and for multiple levels to be sparse (ie not all keys are valid, or indices have different maxiumums based on context.

    If one is to allow those conditions, then why (especially if bounds checking is in effect) the difference for arrays???

  22. Richard says:

    Perhaps something to consider is that if you’re getting to the stage of defining variables like this that you aren’t "separating your concerns". Either you want an array of two-dimensional arrays or it is a two-dimensional array of arrays. Whichever one is correct, define yourself a class that represents the second part of the description. Then define a class defining the whole type. You now have a class representing your complicated object, allowing you to use things like indexers and composite patterns. There’s no confusion, and (wow!) you’re actually taking advantage of using an object-oriented language.

  23. Sometimes I wonder why do you guys arm developers with guns. For instance, I don’t see why fields can be public and properties can be private.

  24. TheCPUWizard says:

    @Marius <wave>

    There are good reasons. Private properties allow a class to internally make use of things like assignment validation and lazy evaluation. And there are times where direct access to fields is required.

    Besides, playing with guns can be exciting and fun <wink>

  25. Ross Patterson says:

    This nonsense is exactly why those of us who grew up on languages other than C cringe at Ritchie’s array notation.  Every other language of C’s age uses something like "int A(1,2,3)" and treats multi-dimensional arrays as first class citizens.  Even FORTRAN got it right!

  26. TheCPUWizard says:

    @Ross,

    I "grew up" on languages that well predate "C" [I started programming in 1972], and actually found "C"’s approach to arrays to be quite good….

    Just goes to show that generalizations always have exceptions [even this one…circuit overload….]

  27. @Ross, it’s not quite true. Treating a multidimensional array as array-of-arrays is perfectly reasonable, and in fact both Wirth’s Pascal and Modula-2 did it that way. I.e. they let you write:

      var a : array [1..2, 1..3] of integer;

    but it was really just syntactic sugar for:

     var a: array [1..2] of array [1..3] of integer;

    and, correspondingly, array access:

     a[1, 2] := 123;

    was syntactic sugar for:

     a[1][2] := 123;

    C simply disposed with the sugar. The real problem isn’t the notion of array-of-arrays; it’s C’s inverted type modifier order.

    BASIC and Ada had true multidimensional arrays, however.

  28. Dylan Borg says:

    I only use 1 dimesnional arrays. Even the language I wrote only allows 1 dimensional arrays

  29. TheCPUWizard says:

    @Dylan, thqat may be fine for a specializxed language, but how would you handle time corellated dimensional mapping. Basically a 4 dimensional array that you can slice along any access (or intersection plane) to get a 3 dimensional array which you can slice…2 dimensional …slice…1 dimensional..slive value…

  30. "brain-destroying"  — I Couldn’t have picked a better word 🙂

  31. KubuS says:

    I second Alex O. – there is nothing special about this and no surprises.

    Option 2 seemed to me very logical both to define and to use – so at least from my point of view this was a good design decision.

    Of course I’m not saying it’s a good practice to use this style of coding – IMHO generics are the way to go 😉

  32. Lawrence says:

    I chose option 2 because I read from left to right…

    int[,][] crazy;

    To me this says:

    this is integer data

    this is a two dimensional array

    each box of that array contains a one dimensional array

    it’s called crazy

    simple.

  33. Metalman says:

    1-D

    //Array of 10

    uint social_security[10];

    uint area_code[10];

    uint zip_code[10];

    //Access item i

    social_security[i];

    area_code[i];

    zip_code[i];

    deck[];

  34. @Pavel: I had some contact with the .NET performance team due to some speed issue, and it turnet out that 2D Array access was one of several problems accumulating there. Besides the workaround of using jagged arrays, I was told that .NET 4.0 Beta 2 likely will have some improvements in the area of 2D arrays.

  35. Ramon Smits says:

    Shouldn’t RAGGED be JAGGED array?

    Either term is acceptable. “Jagged” appears to be the more common term in use on the Internet; I was taught “ragged”. — Eric

  36. David Cuccia says:

    @Markus

    Hooray! Maybe they’ll add mutli-dimensional array serialization too??

  37. yanamjoshi says:

    Playing with arrays for "C" programmers has always being a challenging task especially when it comes to working along with pointers to such arrays. Remembering the *() notation in pure C, has always helped to understand array better, although at the first look, it meant a bit erratic.  *(array_pointer + offset) = 10;

    gives a better understanding of array pointers. But when it comes to …. int **two_dim_pointer;

    its not that easy.

    Applying the same logic, here… int [ , ] [] crazy =  new int [10, 10] [10] should give something like following

    crazy[0,0][0] = 0;

    crazy[0,0][1] = 0;

    Why not trying int three_d_array[] [] []  = new int [10] [10] [10];

    Anyways, good blog… Eric, this will definitely help understand C# from the core-data structure point of view and help appreciate the generic containers. Also, a great help for using C# micro-framework.

  38. Greg says:

    It works out better to use a decently designed data structure to store data instead of 3 dimensional or jagged arrays.

    3 dimensional or jagged arrays are a red flag in our code reviews as they indicate a high likelyhood of poorly designed, over designed, non-functional or hard to support code.

  39. Timwi says:

    The design principle does not hold true for me. I did not expect the order to be consistent, I expected the grammar of the language to be consistent. I expected an int[][] to be initialised as "crazy = new int[][10]" and I expected this 10 to refer to the first index in a derefence (i.e. "a" in "crazy[a][b]"). Maybe this is because I think about these things like a parser – I want to deconstruct things into atomic, fundamental pieces. I expected there to be a simple grammar rule that says that array initialisations are of the form "’new’ <element-type> ‘[‘ <integer-expression> ‘]’", no matter what the element-type is; but C# complicates it because if that element-type is itself an array type, then this simple rule has a complex exception. This trips me up every time and I cannot easily get used to it.

    Naturally, because I think this way, I am initially inclined to say that the design principle was a bad idea. To me, a simple/consistent grammar would have been a more desirable design goal. However, realistically I don’t know the proportion of people to whom the design principle actually applies. Some of the commentors on this entry seem to imply that C#’s solution is in line with their expectation. Have any studies been conducted to determine how many programmers expect the order of the brackets to be consistent across the three contexts, at the expense of having a more complex/inconsistent grammar?

  40. Roman says:

    For me personally, the "inconsistent" rules would have been preferable. Somehow those are always the rules I think of when thinking of a particular situation (e.g., creating a new array). Possibly because they are actually more consistent with a concept I find important – that "int[]" is an inseparable definition of the type "array of ints". It’s a pity that this is not true.

    When I don’t remember a rule exactly I try to guess what it might be. So, wrongly assuming "int[]" is inseparable I always guess that you create a new array of 25 of these by appending "[25]". It could well be that others guess that the order would be the same as in indexing and type "int[25][]", but to me this feels completely alien.

    Like Timwi said, it would be interesting to see if you decided the consistency was important among potential C# users by running a study / poll of some sort.

  41. Robert says:

    Both the int[,][] and List<int>[,] seem very natural to me; I can’t see or understand any conflict.

    Even List<int[,]>[,][] is kinda obvious. It’s two dimensional array, each element of the array is one dimensional array of List<int[,]>. Each elemnt of the list is rectanguar array of integers.

  42. Keith Halewood says:

    It all seemed so easy in Algol68:

    ref [,][] int crazy; // would give you the reference variable which you could then initialise with loc or heap generators, or just:

    [3,2] ref [] int crazy; // gives you a 2-d array instance into which:

    crazy := ( ( loc [5] int := (1,2,3,4,5), loc [2] int := (1,2) ), (  ,  ),  ( , ) …. etc… );

    When c# allows you to perform arbitrary slices of multi-dimensional arrays, then I might actually applaud, ie:

    crazy[1:2,2]

    crazy[1,1][3:5]

  43. Wizzrobe says:

    …D=  My C++ brain is not accustomed to this…I’ll leave C# to C# people. Bye now.

  44. Dominique says:

    I’m with Timwi and Roman, I don’t even find consequence 2 weird for option one.

    It’s kinda like accessing members/methods from a class.

  45. Alex says:

    I’m with Timwi, Roman and Dominique – I find that design decision too "special casy", so to speak. It can never change though, so we’ll have to live with it.

  46. Bahman says:

    Hard to see the trees for the forest.

    Well designed languages offer constructs to help express the high level structures you need, both in code & data. Leave that out many valuable language offerings are considered pitfalls.

    In this case ‘typedef’ is the perfect tool (try declare the simple ‘signal’ function without a typedef!); just typedef the inner data structure and the outer one’s declaration is trivial. This is indeed the reason why usage of Generics was suggested, which works but is just beating around the bush.

    C#’s lack of typedef reminds me of the time when java was missing enums. Java folks finally got their enum. And now is the time to get our typedef in the next C# language update Eric!

  47. bajatoma says:

    August 21, 2009 3:57 PM     Lawrence said: "… it’s called crazy simple"

    I second that… I skimmed through article, and could’t understand option 1 and 2… cause it made no sense to me… so I’m not sure why the right way would surprise people..

  48. Miles says:

    This is why C's way of doing things makes more sense 😀

Skip to main content