Deleting elements from an JavaScript array and closing up the gaps


Today's Little Program is an exercise that solves the following problem:

Given a JavaScript array a and an unsorted array indices (possibly containing duplicates), calculate the result of deleting all of the elements from the original array as specified by the indices. For example, suppose a = ["alice", "bob", "charles", "david", "eve"] and indices = [2, 4, 2, 0].

a[0]  = "alice";
a[1]  = "bob";
a[2]  = "charles";
a[3]  = "david";
a[4]  = "eve";
a.length = 5;

The indices specify that elements 2 (charles), 4 (eve), 2 (charles again, redundant), and 0 (alice) should be deleted from the array, leaving bob and david.

Now, if you had to delete only one element from the array, it is pretty simple:

a.splice(n, 1);

The trick with removing multiple elements is that deleting one element shifts the indices, which can throw off future calculations. One solution is to remove the highest-indexed element first; in other words, operate on the indices in reverse sorted order.

indices.sort().reverse().forEach(function(n) { a.splice(n, 1); });

This technique does still suffer from the problem that if there are duplicates in the indices, extraneous elements get deleted by mistake.

Another approach is to reinterpret the problem by focusing not on the deletion but on the survivors: Produce the array consisting of elements whose indices are not in the list of indices to be deleted.

a = a.filter(function(e, i) { return indices.indexOf(i) >= 0; });

The above approach works well if the list of indices to be deleted is short, but it gets quite expensive if the list is long.

My approach is to use the fact that JavaScript arrays can be sparse. This is a side effect of the fact that JavaScript array indices are actually object properties; the only thing that makes arrays different from generic objects in a language-theoretic sense is the magic length property:

  • If a new property is added, and the property name is the stringification of a whole number, then the length is updated to the numeric value of the added property name, plus 1.

  • If the length property is modified programmatically, the new value must be a whole number, and all properties which are the stringification of a whole number greater than or equal to the new length are deleted.

(See ECMA-262 sections 15.4, 15.4.5.1, and 15.4.5.2 for nitpicky details.)

The first step, then, is to delete all the indices that need to be deleted.

indices.forEach(function(n) { delete a[n]; });

When applied to our sample data, this leaves

a[1]  = "bob";
a[3]  = "david";
a.length = 5;

which gets printed in a rather goofy way: a = [, "bob", , "dave", ].

The next step is to close up the gaps. We take advantage of the fact that the Array enumeration methods operate on indices 0 through length - 1 and that they skip missing elements. Therefore, I can simply apply a dummy filter:

a = a.filter(function() { return true; });

Exercise: What is the difference (aside from performance) between a.push(1); and a = a.concat(1);? How is this question relevant to today's exercise?

Comments (14)
  1. Dan Bugglin says:

    .concat creates a copy of the array, adding the new element onto the copy.

    You're doing a bit of extra work you're just discarding, whereas .push modifies the original array.

    However if you DO want a copy (ie if there are other references to a elsewhere that you want to not gain the "1") .concat is probably the way to go here.

  2. Brian_EE says:

    My first reaction would have been to go through the indices array and remove duplicate entries first and then work through deleting in last to first order. That would work for an implementation in any language.

    Raymond's approach appears to work well given JS language definition. (I Am Not A Javascript Programmer: IANAJP). However I prefer to see innovative solutions that don't depend on such language features but are instead portable.

  3. pc says:

    I similarly was expecting the article to be about needing to remove the duplicate indexes and then just go in reverse order. I'd have never expected ahead of time that JavaScript deals with deleting the some array element twice in a reasonable manner, and it seems an odd (though apparently effective) way to deal with the duplicates.

  4. avakar86 says:

    @Brian EE: the technique here is basically to replace the deleted elements with a sentinel value and then filter them away. It will work in other languages too, it's just that JavaScript provides a special "deleted" sentinel value. In Python, you could define

    class Deleted:

       pass

    then replace the deleted values with Deleted(), then filter based on the result of isinstanceof(x, Deleted).

  5. Brian_EE says:

    @avakar86: And if your array is just an indexed list of unsigned ints, what do you propose for a sentinel value? 0? Well that could be a valid value.

    Point is not everyone is using objects or abstract array types. The world my work is in, 16 kilobytes of SRAM is a luxury, so data is stored as efficiently as possible without all the fancy overhead. The solution presented is only portable in languages that have those abstract types and classing etc.

  6. avakar86 says:

    @Brian EE: I didn't say it works in all languages in general. I was pointing out that it's not JS-specific. It works everywhere you have an available sentinel value. You're guaranteed to have one in (most) dynamically-typed languages; you may have one in others under some circumstances.

  7. Reinard says:

    @Brian EE: -1 if positive int, otherwise null/nil, if not available a separate array maybe. Also JavaScript doesn't exist in the world you describe so that's kind of moot.

    I do agree with your point though that relying so heavily on specific language implementations, even if they are technically defined in the spec is a horrible idea from a maintenance perspective on anything larger than a one-man project. If you do this and don't document it extensively I'd be unhappy with your code.

  8. Wear says:

    @Brian EE The sentinel value for ints is more likely to be the maximum value. Which works as long as you can prove that all possible values of whatever you are dealing with will fall into a range less than that.

  9. Jim says:

    I'm guessing that most JavaScript implementations are heavily optimised for the common case that the indices in use are contiguous, and maybe also that they start at 0. If that's true, removing elements like this might at some point (maybe immediately) cause the whole thing to get copied from a vector-like structure to a tree-like structure. Although it's still an interesting solution, of course.

  10. Tim says:

    I guess what you might be trying to get at from the exercise is that if you use

    a = a.concat(1);

    then a will lose all the non-whole number properties assigned to it. e.g.

    a = [1, 2, 3];

    a.x = 4

    a = a.concat(1); /* a.x is now undefined */

    while a.push(1) will retain such properties.

  11. rep_movsd says:

    I'm feeling all smug because I worked out Raymond's nifty solution before I read it.

  12. cheong00 says:

    I code for web everyday but haven't thought of that before (I always go for "delete in reverse order" route, and jQuery does have .unique() for removing duplicates anyway)

    I've bookmarked this and will try it when have chance (there's more need to add things than removing things on web, and for most time when removing things they're bonded to UI elements such as checkboxes so no need to worry about double removal)

  13. John1337 says:

    I guess .filter() and .forEach() are a little bit slower than manual loop here?

    var a = ["alice", "bob", "charles", "david", "eve"],

    indices = [2, 4, 2, 0];

    /*

    var indicesReverse = indices.sort().reverse();

    for (var i = 0; i < indicesReverse.length; ++i)

    {

    if (indicesReverse[i] == indicesReverse[i-1])

    {

    continue;

    }

    delete a[indicesReverse[i]];

    }

    a; // a.length == 5 and got undefined items

    */

    indices.sort();

    var out = [],

    last = 0;

    for (var i = 0; i < indices.length; ++i)

    {

    // Skip indices duplicates

    var current = indices[i];

    if (current == indices[i-1])

    {

    continue;

    }

    for (var j = last; j < current; ++j)

    {

    out.push(a[j]);

    }

    last = current + 1;

    }

    out; // ok

    Regarding the question a.push(1); VS a = a.concat(1);

    I guess it's about memory, because .concat() create a whole new array. So even if you make use of a.concat() for only 1 element, you will have 2 arrays in memory, temporarily, before a= affecation.

    And doing that each time you want to add an item is bad because the copied array will be bigger and bigger…

    And this will also create a nice workload for the garbage collector.

    [I guess I didn't make it clear. I wasn't going for the best solution. I was going for the most clever solution. -Raymond]
  14. Lake says:

    if a already contains missing elements, like a = ["alice", "bob", , "charles", "david", , "eve"] which must be preserved, that method would not work

Comments are closed.