Performance Optimization of Arrays – Part II


Hello Friends,

Hope you have read part I of this topic which I posted few days ago. If not then I would suggest going through that first as this post is just a continuation of that one.

So coming to the point, here is the second reason –

2. “JavaScript arrays are sparse arrays”– JavaScript allows arrays to be sparse therefore, when you write arrayObj = new Array(100); unlike C runtime, JScript doesn’t allocate any slot/memory for those 100 entries up front. It allocates slot for an index only when an index is to be populated with a value, for example…

arrayObj = new Array(100);
 
arrayObj[10] = 10;
 

In this example, the array size is 100, but in the actual physical memory there is only one slot allocated for this array. Remaining 99 slots don’t exist at all.

Wondering how this factor contributes to the bad performance? Not allocating memory for all the slots up front is good design. Isn’t it?

Well, JScript always assumed that all arrays are sparse. So even if you had a fully dense array in your code, JScript runtime would treat it like a sparse array only. So if you are going to do a pop() operation on a dense JScript array, JScript just won’t go and delete the last indexed entry and update the length attribute. It does something which is extremely performant for sparse arrays, but equally under-performant for dense arrays. Let‘s see what it is all about.

Internally, for each Array object, JScript runtime maintains a table (different from HashTable, let’s call it TrackerTable) which is nothing but a list of pointers to actual entries in the hash table. So if you just create an Array Object of size 100, and add 10 entries (indexed or named), the TrackerTable will have 10 pointers pointing to actual entries in the hash table. Let’s take one simple example…

arrayObj = new Array(100);
 
arrayObj[10] = 10;
 
arrayObj.length = 90;
 

In this example, I just create an Array of size 100, and populated index 10. Next I reduce length of array to 90. Since the length has been reduced, entries from 90 to 99 have to be deleted. The ideal thing to do would be to delete <key, value> pair from hash table for key = 90 to 99. That means 10 operations on Hash table. JScript is smart here and saves 9 out of 10 operations.

What JScript actually does is that it goes to the TrackerTable, iterates through it, for every entry it checks if it falls between deletion range (90-99 in this case), if yes then delete it. In above example, TrackerTable had only one entry. So instead of 10 operations on hash table, JScript does only one iteration and performs better.

However what if the array was a dense array and all the indexed entries from 0 to 100 were populated? Unfortunately in this case too, JScript would follow the same logic. Therefore it would do 100 iterations over TrackerTable and end up doing 90 more operations.

How it was fixed– As I said in the last post, in IE8 JScript we have the mechanism in place to decide whether an array is a sparse or dense. So we have changed the implementation to take sparse path only when the array is actually sparse. That means for operations on dense arrays, we don’t use TrackerTable anymore. We directly go to the right storage and get the things done in fast manner.

So this is all I had to tell about array improvements. Hope you got an idea of what was/is happening under the hood? If not, do leave a note and I would try to address that. And yes, I have lot many interesting things to share with you all, so keep checking this blog.

-JP

Comments (20)

  1. Julian says:

    So is the JScript Array Object a NATIVE JAVASCRIPT OBJECT or not?

    Web Bug Track #181

    http://webbugtrack.blogspot.com/2008/03/bug-181-cant-subclass-array-in-ie.html

    Performance improvements are great, but following the specs is just as, if not more important.

  2. Jaiprakash says:

    @julian – There is no change in array behavior. Only the under-the-hood implementation has changed.

    Yes we are aware of Array-Subclass issue. Thanks for reporting it.

    -JP

  3. steve says:

    @Jaiprakash: I think you mean "It is still *NOT* a Native JavaScript Object"

    If it were a native JavaScript object, subclassing and accessing the hierarchy constructor info would be possible.

    @JScript team: If you can work on making the Array a native JS object for IE8, you would win over many developers! Currently things like this just make Firefox, Safari, Konqueror, Opera, Camino, etc. *far* superior browsers.

  4. liorean says:

    @steve: What does it being a native JavaScript Object or not have to do with being able to do with what you want to do? The main issue is that there is a catch-all setter function – in ECMA-262 3ed parlance a custom [[Put]] (P, V) method – which is not a prototype property, it’s an instance property. The ECMAScript spec doesn’t allow inheriting that behaviour!

    There’s also minor problems in the Array prototype methods, but those are ignorable for most cases.

    Yes, it’d be great if one could inherit the catch-all setter using prototypes, but that issue has nothing to do with whether Arrays are true "Native JavaScript Objects" or not.

    @Jaiprakash: Regarding that bug i pointed out in the comments of Part I (http://blogs.msdn.com/jscript/archive/2008/03/25/performance-optimization-of-arrays-part-i.aspx#8369217) – just wanted to make sure that you noted that Array.prototype.push is not the only Array.prototype method that is affected. All Array.prototype methods that can extend the length property are affected by similar bugs when the resulting array length would exceed 2^32.

  5. steve says:

    @liorean

    The reason why I want a native Array object (which JScript does NOT offer), is that I want to add methods to Arrays that I use as a List (key==integer index), but not to Arrays that are used as Collections (key==random index).

    For example an .indexOf(value) method that returned the (first) index of a given value would be very handy.

    In other browsers, I can subclass the Array, and make a "List" Object… that I can then add this method to.  I can also "cast" it back and forth to/from a True JavaScript Array as and when I feel like it.

    I appreciate that MS has fixed the major performance issues with Arrays, but I would also like them to fix the fact that it (as are many other objects), not a Subclass-able object.

    But i digress.

  6. MSDNArchive says:

    @steve – Unfortunately it is a bug in our implementation that we don’t update “length” property properly when a user defined object’s prototype is set to Array.prototype and any of the array operations are performed on instances of such objects. However I don’t think that just because of this one bug, conclusion should be drawn that Array object in Jscript is not a “native javascript object”.

    Jscript Array object is indeed a Native JavaScript object. All operations, which are expected to work on a JavaScript Array object (x = new Array() or x = []), work perfectly in our implementation.

    – JP

  7. Garrett says:

    @Steve

    Not being able to subclass Array is not a bug for the exact reason Liorean stated. It is not a bug.

    @Jaiprakash it is a bug in the JScript implementation that the the "length" property is not updated on a native EcmaScript object. It does not have anything to do with the "user defined object’s prototype being set to Array.prototype." In fact, Array .prototype.push clearly states:

    "NOTE

    The push function is intentionally generic; it does not require that its this value be an Array object. Therefore it can be transferred to other kinds of objects for use as a method. Whether the push function can be applied successfully to a host object is implementation-dependent."

    Making such claims that "the length property is not updated properly when a user defined object’s prototype is set to Array.prototype" is misleading. It seems to be entering into the cloudy thinking of how web developers imagine JavaScript works and how they want IE to work. It is not relevant to standards.

    We can see in the following example that calling push on a non-array object will not result in the "length" property being properly updated. This seems to be a bug in JScript’s implementation of the push() method, and I have annotated where I think JScript is going wrong further down below.

    <script>

    var p = {};

    Array.prototype.push.call(p, "one slice");

    document.write("p.length: " + p.length + "<br>");

    document.write("p[0] " + p[0] + "<br>");

    </script>

    Results:

    IE8b1:

    | p.length: undefined

    | p[0] undefined

    FF3, Safari3, Opera9:

    | p.length: 1

    | p[0] one slice

    To see what the expected behavior should be, lets look at Array.prototype.push:

    ——————————————-

    15.4.4.7  Array.prototype.push

       1. Call the [[Get]] method of this object with argument "length".

       2. Let n be the result of calling ToUint32(Result(1)).

       3. Get the next argument in the argument list; if there are no more arguments, go to step 7.

       4. Call the [[Put]] method of this object with arguments ToString(n) and Result(3).

       5. Increase n by 1.

       6. Go to step 3.

       7. Call the [[Put]] method of this object with arguments "length" and n.

       8. Return n.

    ——————————————-

    | 1. Call the [[Get]] method of this object with argument "length".

    undefined

    | 2. Let n be the result of calling ToUint32(Result(1))

    That would result in ToUint32(undefined). This behavior is specified:

    ——————————————-

    9.6 ToUint32: (Unsigned 32 Bit Integer)

    The operator ToUint32 converts its argument to one of 232  integer values in the range 0 through 232-1, inclusive. This operator functions as follows:

    1. Call ToNumber on the input argument.

    2. If Result(1) is NaN, +0, -0, +∞, or -∞, return +0.

    3. Compute sign(Result(1)) * floor(abs(Result(1))).

    4. Compute Result(3) modulo 232 ; that is, a finite integer value k of Number type with positive sign and less than 232 in magnitude such the mathematical difference of Result(3) and k is mathematically an integer multiple of 232 .

    5. Return Result(4).

    ——————————————-

    | 1. Call ToNumber on the input argument.

    Now we have to look at ToNumber:-

    ——————————————-

    9.3 ToNumber

    The operator ToNumber converts its argument to a value of type Number according to the following table:

    Input Type Result

    Undefined NaN

    ——————————————-

    ToNumber returns NaN to ToUint32.

    ——————————————-

    1. Call ToNumber on the input argument.

    2. If Result(1) is NaN, +0, -0, +∞, or -∞, return +0.

    ——————————————-

    So ToUint32 returns +0 to Array.prototype.push

    ——————————————-

    15.4.4.7  Array.prototype.push

       1. Call the [[Get]] method of this object with argument "length".

    ——————————————-

    result: undefined

    ——————————————-

       2. Let n be the result of calling ToUint32(Result(1)).

    ——————————————-

    n = 0

    ——————————————-

       3. Get the next argument in the argument list; if there are no more arguments, go to step 7.

    ——————————————-

    "one slice"

    ——————————————-

       4. Call the [[Put]] method of this object with arguments ToString(n) and Result(3).

    ——————————————-

    result: this["0"] = "one slice"

    ——————————————-

       5. Increase n by 1.

    ——————————————-

    n = 1;

    ——————————————-

       6. Go to step 3.

    ——————————————-

    go to step 7.

    ——————————————-

       7. Call the [[Put]] method of this object with arguments "length" and n.

    ——————————————-

    result: this["length"] = 1

    ——————————————-

       8. Return n.

    ——————————————

    In this example, p does not initially have a "length" property. When Array.prototype.push is called, p.length is undefined.

    This value, undefined, is passed to ToUint32, which calls ToNumber(undefined). ToNumber returns NaN to ToUint32, which returns +0 to step 2 of push().

    Garrett

  8. MSDNArchive says:

    Thanks for the correction Garrett. I completely agree with you that the subclass issue does not have anything to do with the "user defined object’s prototype being set to Array.prototype." as you have shown in your repro.

    The whole subclass discussion on this post was based on the bug reported at http://webbugtrack.blogspot.com/2008/03/bug-181-cant-subclass-array-in-ie.html and the repro there does set the prototype to an array object. While replying I overlooked other repros. I apologize if it created any confusion.

    – JP

  9. Garrett says:

    Jaiprakash – An Array cannot be subclassed. It’s not a bug. Did you read Liorean’s post?

    The word "subclass" applied to EcmaScript is jargon. A "subclass" is used to describe a constructor function who’s associated prototype points to an instance of another object, the "superclass". This is common jargon.

    "Subclass" Example:

    function A(){}

    A.prototype.isA = true;

    function B(){}

    B.prototype = new A;

    new B().isA; // true.

    Now, if the problem is "can’t subclass Array", that would seem to imply that there is a bug in having a constructor function whose prototype is set to an Array."

    You said:

    | the subclass issue does not have anything

    | to do with the "user defined object’s

    | prototype being set to Array.prototype."

    The "subclass issue" is the same misconception as "user defined object’s prototype being set to Array.prototype." bug’ (not a bug). The real bugs are in the Array.prototype methods.

    Just because Steve is complaining about "can’t subclass Array" and calling that a ‘bug’ does not mean that that is a bug. I spent a good deal of time explaining the Ecma-262 specification’s Array.prototype.push. There is no text anywhere about ‘subclass an array’ in the spec or in my explanation.

    Not a bug:

    http://webbugtrack.blogspot.com/2008/03/bug-181-cant-subclass-array-in-ie.html

    Regarding that webbugtrack "bug": Although I am not an authority on OOP, I would venture to say that "subclassing" might the be "The #1 rule in OOP". In fact, "encapsulation" might arguably be more important.

    COnclusion:

    1) The "subclass issue" is imaginary

    2) There is a bug in Array.prototype.push()

    Thank you,

    Garrett

  10. liorean says:

    @Jaiprakash: Garrett is correct. That Array.prototype.push doesn’t work on non-arrays is a broken implementation i.e. a bug.

    That the [[Put]] (P, V) method of Array instances is not inherited is on the other side not a broken implementation. The ECMAScript spec does not provide any behaviour by which it would be inherited.  It is thus not a bug.

  11. Garrett says:

    Thank you, David. The "bug" that Microsoft acknowledged is bogus.

    The real problem is with the methods in Array.

    Array.prototype.slice.call({});

    "undefined" in IE8.

    This is a spec violation.

    In IE8b1, I cannot call any array methods in a generic context, using a NodeList as the thisArg. For example:

    Array.prototype.slice.call(document.childNodes);

    IE8 : "Error: JScript Object Expected"

    Has Microsoft considered writing test suites?

  12. rasool.shoaib says:

    hello

    I m facing a problem in javascript arrays. In my application array is used to store different kind of objects like strings, document objects, iframe etc. And on that array object very frequent operations of accessing and updations r made then at times i get invalid object on some valid index at first try and if i made second try to access that object it returns correct object. I have implemented this retry mechanism programmtically.

  13. Jaiprakash says:

    Could you provide little more details, like how the array is accessed, from JScript code, VBSCript or an ActiveX? How are you populating the array, direct assignment or some builtin array operation like push()?

    It would be great if you could post a repro.

    -JP

  14. shuaib says:

    This array is only accessed from jscript.

    here is my sample code for adding and finding values in that array

    this.JSHash = new Array();

    this.length = 0;

    this.blnOrderedHash=false; // for making sorted entries

    this.add = function addHashNode(key, Value){

    key = "key_" + key;

    oldObject =  this.JSHash[key];

    if (typeof (oldObject) == "undefined"|| oldObject == null)

    this.length++;

    this.JSHash[key]=Value;

    }

    this.remove=

    function removeHashNode(key){

    key = "key_" + key;

    oldObject =  this.JSHash[key];

    if (typeof (oldObject) != "undefined" || oldObject != null)

    {

    this.length–;

    delete this.JSHash[key];

    }

    else

    return;

    if(this.blnOrderedHash==true){

    var temp = new Array();

    for(var key in this.JSHash)

    {

    temp[key]=this.JSHash[key];

    }

    this.JSHash = temp;

    }

    }

    this.find =

    function findHashNode(key){

    key = "key_" + key;

    Returned_object_from_hash =  this.JSHash[key];

    if (typeof (Returned_object_from_hash) == "undefined")

    Returned_object_from_hash = null;

    return Returned_object_from_hash;

    }

  15. Ritesh [MSFT] says:

    Thanks for providing the repro. It would be great if you could provide more information — Do you get invalid object for specific type of objects?

    Or it happens randomly for any type of object?

    Can you provide e.g. of objects where this repro returns invalid object?

  16. rasool.shuaib says:

    This array is primarly used to save IFRAME objects. Arrayz find method method takes a string value as key and returns IFRAME obect.

    Some times this IFRAME object returns as strings containing garbage value.

    The size of the array usually ranges from 30 to 40 objects

  17. Jim Radford says:

    I’m guessing that the heuristic for detecting sparse/dense arrays has the number 32 in it.  The following code fails, i.e. it doesn’t delete the splic()ed element, even though the length *does* get decremented correctly.

     var L=33,a=[],b=[]; for (var i=0;i<L; i++) a[i]=i; b.push.apply(b,a); b.splice(L-1,1); b[L-1]==undefined

    Pasting this into the IE8 console gives false where as it should give true if the element were correctly removed.

    If you start with L=32 instead, you’ll correctly get true.  Because of this I’m guessing that there is a bug one of the two ways splice() is implemented.

    -Jim

    PS @JSBlog, please file this bug if you can figure out how; I sure couldn’t.

  18. GauravS says:

    @Jim: Thanks for reporting this issue. We have filed a bug and fixed this issue in the latest IE8 builds.

    GauravS [MSFT]

  19. vb says:

    So does this mean that for "dense" arrays, operations like splice would result in moving all subsequent array elements and potentially cause performance issues esp. when operating near index 0 of large dense arrays?

  20. @Jaiprakash: Garrett is correct. That Array.prototype.push doesn’t work on non-arrays is a broken implementation i.e. a bug.

    That the [[Put]] (P, V) method of Array instances is not inherited is on the other side not a broken implementation. The ECMAScript spec does not provide any behaviour by which it would be inherited.  It is thus not a bug.