Zero based collections or 1 based?
Since programming languages are a bridge between the human concept of a solution and we naturally think the first element is in position 1, why was this not so on the actual language? Why are we made to think like a machine when infact we are not? The infamous “Off by one bug” is there because of the inherant design. So popular its even got a name yet we do nothing to prevent this happening at the language level, as for those who say its easier for the computer, thats why we have smart compilers. Im not a compiler dammit.
One of my hobbies is rewriting lines in TV and films, and I am therefore impelled to comment that moo’s last line should be:
I’m a programmer, not a compiler, Jim
To cover this subject, I’m going to have to set the way-back-machine for the 1980s or earlier, back when real men wrote in assembly and our bytes only had 7 bits. (aside – How many of you – and be honest here – have ever heard of machines where the word size wasn’t a multiple of 8? They really did exist).
Anyway, in those ancient times, processors were fairly glacial in their arithmetic speed, though much faster than the early calculators. Saving cycles was very important, so when arrays were first considered, the implementers looked at the code they wrote:
address = base_address + (x – 1) * sizeof(x)
They actually didn’t write it that way because they didn’t have multiplication in those days, but that gives you the idea. Then somebody noticed that if your array starts with zero, you could write it as:
address = base_address + x * sizeof(x)
Thereby saving you a single decrement operation, which was important in those day.
Therefore early programmers got used to zero-based arrays, and the path was set, and it has stayed that way for many years for the majority of languages.
But why? Isn’t it simple enough to change?
It’s obviously trivially easy to change, and Moore’s law has made the efficiency inconsequential in the majority of scenarios. The issue isn’t around technological limitations, but rather human ones.
Understanding how zero-based indexing works is the secret handshake of the programming world. We all started not knowing the secret handshake, but over time we learned and even began to like the secret handshake, and now we don’t know any other way to shake hands.
We’re not going to try to change our brain wiring just because some young whippersnapper is having trouble remembering that the first index is zero.
Or, to put it another way, developers have a huge investment in hardwired things like this, and changing them will not make your customer happy.
[Update: Jack wrote:
Why cant we either have 1 based or user definable array bounds?
The CLR does support this kind of construct (I had a hard time not using the term “travesty“ here…), but there aren’t, to my knowledge, any languages that have built-in syntax to do that.
Which is a very good thing. If you go down that route, rather than having to remember a single rule (C# arrays are zero-based), you have to remember that every array could be either zero-based, so all your loops become:
for (int index = arr.GetLowerBound(); index <= arr.GetUpperBound(); index++)
If you get this wrong, you get code that works fine for your test cases, but breaks for the people who like 3-based arrays.
Yuck. Many times, “make it an option“ is the worst choice.
Why bother creating a new language if you are still hugging your legacy pillow at night. Waste of time.
One of our main goals was to create a language that was comfortable to C/C++ programmers. We could have taken a different tack, and designed a new language from scratch, and perhaps done some new and exciting things.
But if you look at all the languages out there, you’ll find that having a comfortable syntax is well correlated with language success. Even if you’ve never written C# code before, if you have experience with a C-style language, you’ll be able to read C# code.
You can find another discussion of this issue here.