Performance Optimization of Arrays – Part I


Hello Friends,

Have you observed better performance of Array operations in IE8 than IE7? If not, try the following code in both IE7 & IE8 and I can bet you would not leave this page without reading till the last word.

var arrObj = new Array();
var count = 10000;
var before, after;
for(var i = 0; i < count; i ++)
{
   arrObj.push(i);
}
before = new Date();
for(var i = 0; i < count; i ++)
{
    arrObj.pop();
}
after = new Date();
alert(after- before);

On my machine, IE7 took 7640 ms while IE8 took just 18 ms. As you see the above example is doing nothing but populating an array with 10000 entries and then popping them one by one, and just the pop operation is taking so much time in IE7.

This was just one example. Pick any array operation, and you would notice a huge difference between IE7 and IE8 performance.

So, won’t it be interesting to know why array operations in IE7 were taking so much time? Won’t you like to know how we dealt with all those issues? I am sure the answer is ‘yes’ and in the next few paragraphs that is what I have tried to explain. So keep reading.

Why Array operations were slow?

There were two main reasons which made Array operations so less performant. I will be explaining one of them in this post and leave other one for the next post, otherwise it would be so long to read that I would surely lose the bet.

1. JScript didn’t treat Arrays as Arrays – What does this mean? Arrays, in general sense, are considered a contiguous memory storage which enables fast random access to indexes. In JScript world, Arrays can be sparse, that means an Array can be of length 1 million but it may not have storage committed for all those 1 million indexes. But in real world scenarios, Arrays are hardly used as sparse arrays. They are mostly dense

Unfortunately JScript runtime was not handling arrays for real word usage. It always handled them as sparse arrays. Therefore it never committed any contiguous space for them, resulting into slower access to indexes.

Instead it used to insert indexed entries into a property bag which is nothing but a classic hash table whose keys are strings.

So if you are doing something like the following in your code…

var arrObj = new Array();
for (i = 0; i < 100; i ++)
arrObj[i] = i;
 

… and expecting that JScript runtime internally would allocate a C-Style contiguous array, then sorry to say, it doesn’t meet your expectation at all. Instead what it does internally is…

a) Allocates a generic object, which is nothing but the HashTable.

b) Since the generic object (HashTable) has to be treated like an array object, associate a special attribute “length” with it and set its value to 0.

c) In the for loop, for each value of i

a. Convert ‘i’ to string (e.g. for i = 10, it would be “10”)

b. Add <key value> pair of <string equivalent of i, i> (e.g. <”10” , 10>) to the hash table.

c. Set the “length“attribute’s value properly.

So every time you access an indexed entry of an array, it first converts the index to string, computes hash value for the string and then looks up the hash table.

How we fixed it

No rewards if you already guessed it.

Now JScript runtime treats an Array Object as an special object, different from other JScript objects. Basically it maintains two storage areas for arrays. One is the old HashTable, which is used for storing named entries. The other one is special one which is used exclusively for indexed entries and resembles a C-Style Array.

So if you have some code like following…

arrObj = new Array(20);
for(var i = 0; i < 20; i ++ )
    arrObj[i] = i;
    arrObj.Name = “IE8 Array”;
 

… then for loop is adding entries to a different storage and “Name” is added to the different storage.

Ok. So looks like all indexed entries always go to new storage. No. That is not the case. There is a condition which should be met before we put an indexed entry to the new storage. The condition is that new entry must not make the array sparse and to decide that whether a particular indexed entry would make the array sparse or not, we have certain heuristics, for example…

arrObj = new Array();
arrObj[10] = 20;
arrObj[50000] = 500000;

In above snippet, indexed entry 10 satisfies our heuristics and is added to new storage. But the indexed entry 50000 will not meet them and will be added to old HashTable as a normal named entry.

Cool. looks fine if you always populate the array such that it meets the heuristics, e.g. in an incrementing for loop starting from 0. But what if you want to populate it in a decrementing for loop starting from max length…

arrObj = new Array();
for(var i = 2000; i >=0 ; i -- )
  arrObj[i] = i;
 

or you want to populate the array from both ends …

arrObj = new Array();
var length = 2000;
for(var i = 0; i < length/2 ; i ++ )
{
  arrObj[i] = i;
  arrObj[length – i - 1] = length – i;
}

In such scenarios, not all of your indexed entries will go to new storage and performance of further operations on this array would not be as great as it can be.

For such scenarios, to get working in most performant way, what you have to do is to pass the actual array size to constructor when you create the object. If you pass the size, JScript runtime assumes that you are sure that your array would be dense and don’t want our heuristics to ensure that. We will not exercise those heuristics for any indexed entry added within that size range, in whatsoever order they are added. So if you write something like following…

arrObj = new Array(50000);
arrObj[10] = 20;
arrObj[50000] = 500000;

… both will go to the new storage even though the last indexed entry doesn’t satisfy heuristics. But do remember that anything beyond 50000, will have to meet the heuristics, else it would go to HashTable.

Time out. Hope you enjoyed reading it and would like to read second part as well. See you soon.

-JP


Comments (14)

  1. That’s really interesting. We already thought that you have implemented it this way in the old engine.

    Would be great if IE8 would also allow to extending (inheriting) from arrays using the normal prototype chain:

    function QxArray() {

     Array.prototype.call(this);

    }

    QxArray.prototype = new Array;

    This still seems not to be possible. This works in all other browsers. Please have a look at this as well.

  2. Dave says:

    Great work! This will definitely improve performance. Like Sebastian, I would like to see it be possible to extend Array. Right now, that seems to break because of issues with the .length property.

  3. Hold your Horses says:

    Woah! that was definately messed up!

    But as Sebastian questions…

    Can I prototype on the JavaScript Array Object or not?  If not, GO BACK TO THE DRAWING BOARD!

    Otherwise, thanks for the performance improvement, it is much appreciated.

  4. TNO says:

    Out of curiosity, why didn’t you use the array literal? []

  5. Jaiprakash says:

    Sebastian, Dave – Thanks for pointing out the issue with inheriting from Array.prototype scenarios.

    TNO – There is no specific reason for not using array literals, I just started with new Array() and continued with it for consistency. The fix works for all array objects, no matter how they are declared in code.

    – JP

  6. liorean says:

    Is this safe though? For something like "var a=new Array(0xffffffff),a[0xfffffffe]=0xfffffffe;" I really hope you’re not allocating a true array of 2^32 elements.

    Oh, and not really on topic but still related, you have a set of bugs in those functions that extend the array in IE8b1 and earlier. I’ll show you the case of Array.prototype.push, but you have analogous bugs in the other array functions:

    —-<example>—-

    var

       a=[0,1,2,3],

       b=[],

       i=0xfffffffd;

       a[i]=i++;

    try{

       alert(‘Initial length: ‘+a.length);

       alert(‘a.push returns: ‘+a.push(i++,i++,i++));

    }catch(err){

       alert(‘ERROR:nn Name: ‘+err.name+’nn Description: ‘+err.description+’nn Message: ‘+err.message+’nn Number: ‘+err.number);

    }finally{

       for(i in a)

           b.push(i+’: ‘+a[i]);

           b.push(‘length: ‘+a.length);

           alert(b.join(‘n’));

    }

    —-</example>—-

    —-<IE8b1 results>—-

    – "Initial length:4294967294"

    – "a.push returns: 1"

    – No RangeError!

    – Array members listed in the finally clause:

       0: 4294967296

       -2: 4294967294

       -1: 4294967295

       length: 1

    —-</IE8b1 results>—-

    —-<Correct results>—-

    – "Initial length: 4294967294"

    – RangeError

    – a.push doesn’t return, it throws a RangeError

    – Array members listed in the finally clause:

    0: 0

    1: 1

    2: 2

    3: 3

    4294967293: 4294967293

    4294967294: 4294967294

    4294967295: 4294967295

    4294967296: 4294967296

    length: 4294967295

    —-</Correct results>—-

    Why is that the correct behaviour? Well, the relevant pieces of the ECMAScript spec are "15.4.4.7 Array.prototype.push", "9.6 ToUint32" and "15.4.5.1 [[Put]]".

    In step 2 in the algorithm of 15.4.4.7 the variable n is set to the result of calling ToUint32 with the length of the array as argument. This is the only time the length is read out in that algorithm. The type of the variable n is not specified here.

    In step 4 in the algorithm of 9.6 the variable k is specified to be of Number type. The result of ToUint32 is thus a uint32 value stored in a Number (double precision floating point) type and not a uint32 type.

    In step 5 in the algorithm of 15.4.4.7 the variable n is increased by 1. Since n is a Number and not a uint32 type, this means n may contain values of 2^32 or larger.

    In step 4 in the algorithm of 15.4.4.7 a property with the name gotten from ToString(n) is assigned the value of the respective argument.

    In step 7 in the algorithm of 15.4.4.7 [[put]] is called with "length" and the value of n.

    In step 13 in the algorithm of 15.4.5.1 the value to set is compared to the result of calling ToUint32 on that value. If not equal, a RangeError should here be thrown.

    Applied to the example, this is what should happen:

    00. a.length is 4294967294 (2^32-2)

    01. a.push is called

    02. n is set to ToUint32(length)

    03. The property a[n] is set to 4294967294

    04. n is an array index thus a.length is set to n+1 = 4294967295 (2^32-1, which is maximum value for the length)

    05. n is increased to 4294967295

    06. The property a[n] is set to 4294967295

    07. n is increased to 4294967296 (2^32, one larger than maximum length)

    08. The property a[n] is set to 4294967296

    09. n is increased to 4294967297 (2^32+1, two larger than maximum length)

    10. [[put]] is called with "length" and n as arguments.

    11. [[put]] compares 4294967297 to ToUint32(4294967297)=1 and finds they are not equal, and throws a RangeError

    Relevant spec text (ECMA262-3ed):

    —-

    15.4.4.7 Array.prototype.push ( [ item1 [ , item2 [ , &#8230; ] ] ] )

    The arguments are appended to the end of the array, in the order in which they appear. The new length of the array is returned as the result of the call.

    When the push method is called with zero or more arguments item1, item2, etc., the following steps are taken:

    1. Call the [[Get]] method of this object with argument "length".

    2. Let n be the result of calling ToUint32(Result(1)).

    3. Get the next argument in the argument list; if there are no more arguments, go to step 7.

    4. Call the [[Put]] method of this object with arguments ToString(n) and Result(3).

    5. Increase n by 1.

    6. Go to step 3.

    7. Call the [[Put]] method of this object with arguments "length" and n.

    8. Return n.

    The length property of the push method is 1.

    NOTE

    The push function is intentionally generic; it does not require that its this value be an Array object.

    Therefore it can be transferred to other kinds of objects for use as a method. Whether the push function can be applied successfully to a host object is implementation-dependent.

    —-

    9.6 ToUint32: (Unsigned 32 Bit Integer)

    The operator ToUint32 converts its argument to one of 2^32 integer values in the range 0 through 2^32&#8722;1, inclusive. This operator functions as follows:

    1. Call ToNumber on the input argument.

    2. If Result(1) is NaN, +0, &#8722;0, +&#8734;, or &#8722;&#8734;, return +0.

    3. Compute sign(Result(1)) * floor(abs(Result(1))).

    4. Compute Result(3) modulo 2^32; that is, a finite integer value k of Number type with positive sign and less than 2^32 in magnitude such the mathematical difference of Result(3) and k is mathematically an integer multiple of 2^32.

    5. Return Result(4).

    NOTE

    Given the above definition of ToUInt32:

    Step 5 is the only difference between ToUint32 and ToInt32.

    The ToUint32 operation is idempotent: if applied to a result that it produced, the second application leaves that value unchanged.

    ToUint32(ToInt32(x)) is equal to ToUint32(x) for all values of x. (It is to preserve this latter property that +&#8734; and &#8722;&#8734; are mapped to +0.)

    ToUint32 maps &#8722;0 to +0.

    —-

    15.4.5.1 [[Put]] (P, V)

    Array objects use a variation of the [[Put]] method used for other native ECMAScript objects (8.6.2.2).

    Assume A is an Array object and P is a string.

    When the [[Put]] method of A is called with property P and value V, the following steps are taken:

    1. Call the [[CanPut]] method of A with name P.

    2. If Result(1) is false, return.

    3. If A doesn&#8217;t have a property with name P, go to step 7.

    4. If P is "length", go to step 12.

    5. Set the value of property P of A to V.

    6. Go to step 8.

    7. Create a property with name P, set its value to V and give it empty attributes.

    8. If P is not an array index, return.

    9. If ToUint32(P) is less than the value of the length property of A, then return.

    10. Change (or set) the value of the length property of A to ToUint32(P)+1.

    11. Return.

    12. Compute ToUint32(V).

    13. If Result(12) is not equal to ToNumber(V), throw a RangeError exception.

    14. For every integer k that is less than the value of the length property of A but not less than Result(12), if A itself has a property (not an inherited property) named ToString(k), then delete that property.

    15. Set the value of property P of A to Result(12).

    16. Return.

  7. JScript Blog says:

    Hello Friends, Hope you have read part I of this topic which I posted few days ago. If not then I would

  8. Jaiprakash says:

    @liorean – No we don’t allocate memory for elements upfront. It is allocated only when indexes are actually populated.

    And many thanks for reporting the bug and writing such a detailed explanation.

    -JP

  9. gOODiDEA.NET says:

    .NET Video: Write Your First Silverlight Game Increasing the Size of your Stack Web CSS 真的可以浮动么? xUnit

  10. gOODiDEA says:

    .NETVideo:WriteYourFirstSilverlightGameIncreasingtheSizeofyourStackWebCSS真的可以浮动么…

  11. Darryl says:

    I’m using JScript 5.7.5730 as a server-side language in a classic-ASP-like environment, and I’m a little confused.  

    You talk about arrays "now" and arrays "before", and the difference appears to be based on IE7/8.  But what are the real version numbers?  

    Do I in 5.7.5730 have the new array handling?

    Thanks,

    Darryl.

  12. MSDNArchive says:

    Sorry if I was not clear. As of today, the only way to get these fixes on your box is to install IE8 Beta1 from http://www.microsoft.com/windows/products/winfamily/ie/ie8/readiness/Install.htm.

    The 5.7.0.5730 is I think IE7, and will not have these fixes.

    – JP [MSFT]

  13. Dan Breslau says:

    Sorry for coming in late; I know that by now your team is working on finishing up IE9.

    I’ve been looking into array performance, and I found something that I think is interesting: Passing the array’s size into the array constructor does improve the performance for indexed entries in the array (as you described). But it doesn’t help when looking up undefined values — that is, elements that aren’t currently indexed in the array. In fact, compared to arrays where the size *isn’t* passed into the constructor, it seems there’s even a *penalty* for looking up undefined values. Of course, I don’t know how this is implemented, but that penalty isn’t what I’d expect; is it a bug?

    For details, see http://www.outofwhatbox.com/blog/2009/12/javascript-array-performance-and-why-it-matters/

    I’d be glad to send you the raw data if it would be helpful.

  14. Hello Friends, Hope you have read part I of this topic which I posted few days ago. If not then I would