Performance issues with "String Concatenation" in JScript.


Hello Friends,
I am Jaiprakash and work as a Developer in the Jscript team. There is not much to tell about me except one fact - I can’t recall the last time when any of my teammates called me by my name, they just call me JP
J. 


I know, you have seen pretty cool blogs in past from my team mates and I promise this one also not going to disappoint you. So coming to the point, I am going to talk about the famous ‘string append’ problem in Internet Explorer (or Jscript to be more accurate) and how are we going to fix it.


The Problem
let’s do a small experiment to get to it. Copy paste following code in an html file and run in Internet Explorer. Observe the time reported.


<script language="javascript">
var before = new Date().getTime();
var s = "jscript";
x = "";
count = 20000;
for (var i = 0; i < count; i++) {
     x += s;
}
var after = new Date().getTime();
var time = after - before;
alert("Time taken to append 'jscript' " + count + " times : " + time + " ms");
</script>


On my machine the time reported is 1922 ms (averaged out three times). As you see this code is doing nothing other than appending a 7 character string 20K times. And for such a trivial operation it is consuming almost 2 seconds. Phewww….
 Impressed? Not yet. It still doesn’t appeal you? OK. Try increasing the no of appends and enjoy the show. Here is what I get on my box…



















Appends


Time (ms)


10K


260


20K


1922


25K


6125


30K


11578



Well, I am sure you would be impressed by now. And I can feel your curiosity to know how Jscript makes such a trivial looking operation such a big one.


Not much, it uses a very trivial algorithm to append strings to get you to such a great show.  Whenever it has to append two strings of size ‘m’ and ‘n’ each, it allocates a new buffer of size (m + n), copies the strings one by one, and returns this buffer back. Hmmm. This logic looks fine. What’s the deal then?


Let’s say you want to append 3 strings like following...


resultStr = “Jscript” + “String” + “Append”;


To give you the final result, Jscript first appends “Jscript” and “String”. It allocates a buffer of size 13, copies them one by one to the buffer and assigns the buffer to a temporary, let’s say temp1. Next it appends temp1 and “Append”, allocates a buffer of size 19, copies temp1 and “Append”, and assigns the result to resultStr.  As you see, to generate final string for you it copied “Jscript” and “String” twice (temp1 and resultStr) and allocated one intermediate buffer. Imagine what happens when you append ‘n’ such strings. It will allocate n-1 buffers, copy each string (n-i) times where ‘i’ is the index at which the string appeared in the operation.


You might be thinking why Jscript was shipped with such an algorithm. Well, Jscript was meant to be used just as glue, not as full-fledge programming language as used in modern WebApps.


 How it affects Real world Scenarios
Gone are the days when web apps used to be static text which was just retrieved from the server and displayed in your browser. Today’s world is AJAX, involves lot more interaction with server, and data received is not just the plain HTML. A modern WebApp fetches data from server; builds the page on your machine using Jscript and DOM APIs, and displays on your browser. And to build the page it has to generate HTML, which is nothing but text. And to generate HTML out of the pieces retrieved from server, it has to append them as strings. And as the applications are becoming smarter and smarter, they tend to minimize the interaction with server, and do as much processing as possible in your browser.


Let me give you some real world examples. You all must have used some or the other AJAX mail client like msn.com, outlook web access or yahoo mail beta. In your inbox, if you scroll down or up, they just update your mail list not the entire page. How it happens? It gets the information about mails from server, builds the table of your mails on the fly and displays in the browser.


Another good example would be Redfin which is actually AJAX based real estate portal. If you search a locality, it shows a map with location of all the results and below the map it shows details of each result. This table it constructs by appending several hundreds of strings.


These are just two. There are many more such. Basically what I want to convey is that string append is a pretty common and basic operation and probability is very high that large number of web apps get affected by this.


So we are fixing it!!!
We are trying an algorithm which avoids those intermediate buffer allocations and repeated copy operations. With this algorithm, I just ran the same test case and here are the numbers for you. Please note that the fix is still under progress so these numbers might change a bit here and there, however they should be enough to give you an idea of improvements.



















Appends


Time (ms)


10K


23


20K


47


25K


62


30K


78


Shhhh. Numbers speak for themselvesJ.


JP, I Need this fix on my box, now!!!
I wish I could help you. You will have to wait till next release of Internet Explorer.


At last…
Please help us in helping you. If you have encountered other such script “hairballs”, do leave a note on any of the blogs published here. I want to publish more such blogs for you
J.


 - JP


 

Comments (32)

  1. Lionel says:

    > Numbers speak for themselves.

    Very nice! 🙂

    > You will have to wait till next release of Internet Explorer.

    Waiting is hard… 🙁

  2. Matt Ellis says:

    Nice one JP! But after reading Eric’s post that you linked to, I’m really interested to know what algorithm you’ve used to fix it. Any chance of a behind-the-scenes look?

    Cheers

    Matt

  3. SharpGIS says:

    Seems like a vaste of time for me. Even in JScript there are a much better way to concatenate strings than this by using a string builder approach. And we can use that with todays browsers (and of course we all do right?) This is no different from what we should be doing in all the other programming languages we use, because they all have this problem with string copying.

    Having that said, The JScript performance is one of IE’s biggest achilles heals these days, and it’s good news that you are optimizing it. VML rendering and mousemove needs some clean up too.

  4. shabunc says:

    Oh, just great.

    Listen me. guys, we have an almighty algorithm, but we can not describe it.

  5. That’s why I’ve been using arrays to concatenate strings for, oh, about ten years now.

    To quote JPJ’s examples:

    var x = [];

    for (var i = 0; i < count; i++) {

        x.push(s);

    }

    x = x.join(”);

    Output:

    Time taken to += ‘jscript’ 20000 times : 875 ms

    Time taken to push ‘jscript’ 20000 times : 78 ms

    And the other one is simply rewritten:

    resultStr = [‘Jscript’, ‘String’, ‘Append’].join(”);

  6. Shabunc says:

    Martijn Coppoolse, yor approach is excellent for IE, but if you want to have a cross-broser decision, you should mention that Array.join approach is the worst in Firefox and not the best in Opera.

  7. shivaram.p says:

    Thanks for the feedback guys. The algorithm we have used is somewhat similar to what Eric mentioned in his blog. As I said it is still under testing and may need some changes.

    As Martin and SharpGIS mentioned, folks have been using Array.push() and Array.join() approach to concatenate strings in IE, but still not everyone knows about it. New developers still fall back to using direct string concatenation approach, which hurts perf and causes lot of extra work. So why not have all of them performant and let developers concentrate on ‘task in hand’ rather than ‘best practices’ :).

    –  JP

  8. anphanax says:

    I do not know where the VBScript people are (or if they even have a blog), or if you’re all part of the same team. I am very curious as to whether or not VBScript will be getting this sort of enhancement too, given how much classic ASP in VBScript is still out there and the large amount of other scripts written in it.

    It kind of sucks going through a script written that does a TON of string concatenation (which cannot be avoided with how it is written) and converting the code to use Join(Array(…), “”).

    [Suresh] We own VBScript too. VBScript is under sustaining engineering mode for quite some time and we are not planning any new DCR. However we are there to address any blocking issues for the customers. We shall treat this as a bug and plan for a fix. 

  9. Tino Zijdel says:

    I wrote an article about this last year: http://therealcrisp.xs4all.nl/blog/2006/12/09/string-performance-in-internet-explorer/

    bottom line is that a stringbuilder approach isn’t optimal either because of IE’s poor performance in all the other areas and you would either have to fork your code for IE or experience degraded performance in other browsers who do have decent string-handling performance.

    It’s good that MS is finally doing something about this long-standing issue, I do have however one question: will JScript in future also support accessing strings as a character-array like all other ECMA-compliant implementations?

    For example: alert(‘foo'[2]) should give me ‘o’

  10. Rasmus says:

    Should be possible to get a nice speedup by avoiding some of the long string copies. Something like:

    var blocksize = 10;

    for (var i = 0; i < count / blocksize; i++) {

       var tmp = "";

       for (var j = 0; j < blocksize; j++) {

           tmp += s;

       }

       x += tmp;

    }

    This way the number of string copies are the same, but a lot of them are copying short strings instead of long strings.

  11. Jaiprakash says:

    Absolutely right. The current implmentation gets slower ans slower as string size increases. It’s nice work around to append small strings, and then generate a big string out of them.

    -JP

  12. What is holding you from putting this fix in a next update of current versions? It’s not a crime to release performance updates after a products release.

  13. rektide says:

    i suggest mentioning array.join in the article, such that less new programmers have to be lightly bruised with sticks for their crimes.  regardless of ie’s performance, the problem is not string concatenation, its the people who use it.

  14. UUDIBUUDI says:

    I use this all the time in HTA’s, simple, easy to read and pretty efficient (i lookup the length of the array so I can add stuff to the output without having to increase a counter). No extra functions / classes needed! 🙂

    var output = new Array();

    for (var i = 0; i < 1000; i++) {

      output[output.length] = i;

    }

    alert(output.join(”));

  15. Photowiz says:

    This is not a waste of time, SharpGIS. Whether you can improve string performance is not relevant to the thousands of web pages already existing that use string builders.

  16. Keith says:

    This article would have been more accurately titled: "Performance issues with "Extremely Long String Concatenation" in JScript."

    Because, unfortunately I think it’s a flawed test that will mislead lots of readers of that thread can have them use a less attractive coding pattern that can be up to 3x *slower*, to do a simple thing like concatenate two strings. I will post my tests to the thread.

    I think that "fixing" our code according to that article would actually reduce our performance.

    The problem with the example given is that it creates a very specific, non-realistic test that tickles a particular known code path. Specifically, it repeatedly adds a small string to an extremely long one. That is *not* a real world use case, and if it is, then yes, people should use one of the supplied alternative patterns (and, of course, MS should fix their implementation in any case).

    That IE’s JS performs particularly poorly in this regard is evidenced by the non-linear performance times (and perhaps that’s why JP didn’t show us, e.g. 1k iterations, or 1 or 20 iterations, which are quite good). This is also mentioned by a follow-up poster.

    Normally one loops a performance test many times and then divides by the number of iterations to see what the single-iteration cost is, but that is not accurate in JP’s test. Essentially, JP’s test tells us nothing about how long it takes to do a single string concatenation or even 1000, it only tells us how long it takes to create a very long string via 20k iterations.

    What, then, if we did a more realistic test, which actually tested what real code is doing? I did that and you’ll see that the results are far better *and* far more linear.

    I’ve included two such tests here, as well as my timings for JP’s tests, Martin’s tests, and my tests. You’ll see that in the normal case, IE has stellar performance. I also included a "1000" iteration number so you can see the linearity of the various tests.

    JP’s code: 20k iterations (1000 iterations):

    IE: 34000ms (300ms) (yes, you read that right)

    FF: 83ms (4ms)

    Martijn Coppoolse ArrayTest (super long strings): 20k iterations (1000 iterations):

    IE: 63ms (2ms)

    FF:120ms (5ms)

    Keith ArrayTest: 20k iterations (1000 iterations)

    IE: 164ms (6ms)

    FF: 312ms (15ms)

    Keith ConcatTest1 (+): 20k iterations (1000 iterations)

    IE: 43ms (1ms)

    FF: 76ms (1ms)

    Keith ConcatTest2 (+=): 20k iterations (1000 iterations)

    IE: 51ms (1ms)

    FF: 109ms (6ms)

    **** THE CODE ****

    ARRAYTEST

    var before = new Date().getTime();

    var s1 = "11111";

    var s2 = "22222";

    count = 20000;

    for (var i = 0; i < count; i++) {

        x = [s1, s2];

        x.join(”);

    }

    var after = new Date().getTime();

    var time = after – before;

    alert("ARRAY TEST(.join): Time taken to append ‘"+s1+"’+’"+s2+"'(="+x+") " + count + " times : " + time + " ms");

    CONCAT TEST1 (+)

    var before = new Date().getTime();

    var s1 = "11111";

    var s2 = "22222";

    count = 20000;

    for (var i = 0; i < count; i++) {

        x = s1+s2;

    }

    var after = new Date().getTime();

    var time = after – before;

    alert("CONCAT1 TEST(+): Time taken to append ‘"+s1+"’+’"+s2+"'(="+x+") " + count + " times : " + time + " ms");

    CONCAT TEST2 (+=)

    var before = new Date().getTime();

    var s1 = "11111";

    var s2 = "22222";

    count = 20000;

    for (var i = 0; i < count; i++) {

       x = "AAAAA";

       x += s2;

    }

    var after = new Date().getTime();

    var time = after – before;

    alert("CONCAT2 TEST(+=): Time taken to append ‘"+s1+"’+’"+s2+"'(="+x+") " + count + " times : " + time + " ms");

  17. Jaiprakash says:

    Thanks Keith for such detailed example and numbers. I really appreciate it. I agree that I should have stated that problem lies in creating big strings through multiple string concatenation, not small strings. However I feel that contents in the blog convey this. Pls. check out the section "How it affects Real world Scenarios" and the paragraph just above it where I have explained the reason behind IE’s slowness.

    One more thing I would like to mention is that most of the Jscript benchmarks floating around do such concatenations to test the string concatenation performance.

    – JP

  18. Josh Stodola says:

    While it was overall great to read this, it is horrible, pathetic, and gut-wrenching to hear is that we must wait for the next release of Internet Explorer before we can BEGIN to utilize the fix!

    I don’t understand why you can’t patch IE7…? Go ahead and take a look and see the stats on how many people currently use IE6.

    http://www.w3schools.com/browsers/browsers_stats.asp

    Now consider how long it will be before the majority of IE users will be running the next release (scheduled for 2009?).

    RIDICULOUS.

  19. Ricky says:

    Is this algorithm available in IE7 patch?

    Ricky.

  20. Jaiprakash says:

    I believe you mean script 5.7. Sorry, it is not there.

    -JP

  21. Webdesign says:

    Seems like a vaste of time for me. Even in JScript there are a much better way to concatenate strings than this by using a string builder approach. And we can use that with todays browsers (and of course we all do right?) This is no different from what we should be doing in all the other programming languages we use, because they all have this problem with string copying.

  22. Tweak Vista says:

    i suggest mentioning array.join in the article, such that less new programmers have to be lightly bruised with sticks for their crimes.  regardless of ie’s performance, the problem is not string concatenation, its the people who use it.

  23. Nice one JP! But after reading Eric’s post that you linked to, I’m really interested to know what algorithm you’ve used to fix it. Any chance of a behind-the-scenes look?

  24. LFERC says:

    > Numbers speak for themselves.

    Very nice! 🙂

    > You will have to wait till next release of Internet Explorer.

    Waiting is hard… 🙁

  25. En JavaScript, comme en .net ou ava, les strings sont immutables cela veut dire que l’objet ne peut pas

  26. timeless says:

    please note that "abc"[1] is not required by ECMA. JScript is not doing anything wrong. It’s technically an extension.

  27. JScript Blog says:

    Making developers more productive through the design, development, and debug phases of web application

  28. IE8 Beta1 is now available here . It has remarkable script improvements in JScript performance. It also

  29. JScript Blog says:

    Hello Friends, Have you read my post on the String Concatenation issue? If yes, then I can sense your

  30. RobG says:

    Good to see you are working on this, but IE is still way slower than other browsers.

  31. Dear JP☺,

    My application is a screenplayformatter, so I get lots of performance issues (my screenplays are particularly big: 2hr-4hr, 9hr the latest) and I’ve been improving it 2-years since my first attempt pre contenteditable (I use some neat ‘tricks’ eg. to get all indentations of a type to adjust at-once, disabled=true,false)!

    Here’s a current issue: I’d like to generalize my script-source to include doc/rtf/txt/html… anything that ‘looks’ like a script …

    But,–

    Copy-Paste takes an inordinate amount of time (‘ordinateur’, is French for computer)– when the source is rtf: MSIE-itself converts quickly to html, but paste-html into contenteditable– takes yah-yah-yah… Okay… So,– I’d like to catch the html from the clipBoard and condense the long P-SPAN-O:P-…(on every line) leaving just the indentation for my pre-parser … BUT clipBoard html isn’t available… just– text.

    I’d like the html or indentationed-html source from the clipBoard…. Please, and, Thank you.

    Cordially, sincerely,

    Ray.

    P.S. You know why I paste into contenteditable: WRAP, takes inordinate time; So, I wrapper the DIV in CITE to get the fastest onPaste=getData ca 3 sec./MB,- and okay for a 6hr script. Ray.

Skip to main content