How to insert a large number of items into a treeview efficiently


Just a quick tip today.

If you need to insert a large number of items into a treeview, like tens of thousands, then it’s much more efficient to insert them “backwards”. (I’m ignoring for now the usability question of having a treeview that large in the first place.) In other words, instead of

for (i = 0; i < array.Count(); i++) {
 TVINSERTSTRUCT tvis;
 tvis.hParent = hParentNode;
 tvis.hInsertAfter = TVIF_LAST;
 tvis.item.mask = TVIF_TEXT;
 item.item.pszText = array[i].Text();
 TreeView_InsertItem(hwndTreeView, &tvis);
}

do it this way:

for (i = array.Count() - 1; i >= 0; i--) {
 TVINSERTSTRUCT tvis;
 tvis.hParent = hParentNode;
 tvis.hInsertAfter = TVIF_FIRST;
 tvis.item.mask = TVIF_TEXT;
 item.item.pszText = array[i].Text();
 TreeView_InsertItem(hwndTreeView, &tvis);
}

Why is backwards-insertion faster?

It has to do with the annoying hInsert­After parameter. To validate that the hInsert­After parameter is valid, the treeview needs to verify that the hInsert­After is a valid child of the hParent, and this is done by walking the parent’s children looking for a match. The sooner you find the match, the faster the validation completes. (And if you pass TVI_LAST, then the treeview needs to walk to the end of the child list.)

You’d think that you could verify the parent/child relationship by just doing a Tree­View_Get­Parent(hInsert­After), but that turns out not to be strict enough, because hInsert­After might itself not be a valid parameter. If hInsert­After is a bogus value, then you may crash when you try to read its Parent property. That’s if you’re lucky. If you’re not lucky, then the random memory that hInsert­After points to might look just enough like a valid HTREEITEM that you end up inserting the new node after a completely bogus node, and now the treeview has become corrupted.

Sure, you got the same problem if you passed a garbage HTREEITEM to Tree­View_Get­Parent, but in that case, it’s just garbage in garbage out. Nothing gets corrupted; the application just gets a garbage result. But in the case of Tree­View_Insert­Item, the treeview is going to update its internal data structures based on the garbage you passed in, and that means that the treeview winds up corrupted.

To ensure that the value passed in is valid, the treeview checks it against the list of valid values for hInsert­After. And therefore, you get better performance if the valid value you pass is the one that it checks first.

(As it happens, a lot of programs pass garbage for hInsert­After, so this defensive validation step is absolutely necessary.)

You might say that the treeview could have a one-off optimization for the special TVI_LAST value by remembering the last child so it can be located quickly. The question is whether the complexity of adding that optimization is worth the cost: Any tree rearranging function would have to update the cached value, and if you miss a spot, then the treeview ends up corrupted. That’s a pretty high-risk optimization you’re suggesting there.

And think about it this way: You’re worrying about a tree where a single node has tens of thousands of children, something which (I am no longer ignoring) is a horrible user interface. That’s like a list box with tens of thousands of items, or a dialog box with tens of thousands of checkboxes. You’ll probably want to consider a better way of presenting the information than in a tree view that goes on for hundreds of screens. The treeview isn’t optimized for this case because you don’t optimize for the case where somebody is mis-using your system.

Comments (18)
  1. Joshua says:

    I read this, thinking yeah I've built the 40,000 item tree before (it takes about 30 seconds on a 2GHz PIV).

    Then I realized you're talking about >10,000 nodes at any one level. Yeah, that's too big.

  2. KS says:

    "list box with tens of thousands of items"? Look at IE's "Add to Favorites" dialog -> Create in. There it is. Maybe you could enlighten the IE team that this is a "horrible user interface"? Maybe then they listen.

    [? Mine has only six items, not ten thousand. -Raymond]
  3. acq says:

    "treeview needs to verify that the hInsert­After is a valid child of the hParent"

    Wow, I've never seen in the documentation that the hInsert­After must be a child of hParent. The only thing I've read was: "hInsertAfter: Handle to the item after which the new item is to be inserted." I always beleived it's the last inserted item independent of parent/child relationship.

    [Not sure what you mean by "the last inserted item independent of parent/child relationship". If you say "Insert C as a new child of A after B", then B has to be a child of A also; otherwise you have a tree situation where C == NextSibling(B), yet Parent(C) != Parent(B). -Raymond]
  4. AsmGuru62 says:

    Tree view where more than 32K of items got expanded can't properly navigate – scrollbars are using old style 16-bit type of navigation.

  5. 640k says:

    And that's why windows7's explorer doesn't use treeview any more. Navigating to a folder with 100k+ files/folder nearly hangs previous OS, at least it pause IIS for a while (websites temporary down).

  6. Leo Davidson says:

    @640k: Spy++ tells me Windows Explorer is still using a SysTreeView32 control.

    (At least in Windows 7. I don't have Win8 to hand.)

  7. jon says:

    @Leo Davidson: A SysTreeView32 control with lots of undocumented features, you mean :)

  8. acq says:

    "Given that a treeview is, well, a tree structure, it's not clear what "after" means when applied to a tree as a whole"

    I agree. For me it was also unclear that hInsertAfter can even be "TVI_ROOT: Add the item as a root item." Why that?

  9. CarlD says:

    "You'll probably want to consider a better way of presenting the information than in a tree view that goes on for hundreds of screens."

    Or a combo box that tries to cram 10,000 items into a scrollable, non-searchable drop-down!

    Unfortuantely, sometimes you have no choice: e.g. a query-based parameter in SSRS allows for one UI representation:  a combo box, regardless of how many items might be presented.  It is truly a horrible UI experience.

  10. Phaeron says:

    Having run into this problem before, the counterpoints I would make are that:

    1) You can see this performance issue with about a thousand nodes, which isn't that bad to browse if the tree view is full screen height.

    2) You don't always have enough control over the data that you receive.

    3) In my view, it's sub-optimal design for a UI control to have O(N^2) insertion performance when populated in natural order.

    The danger with having an O(N^2) situation like this is that it doesn't just make the program slow as N grows large but instead it makes the program unusable. Since a tree view is often used for the top-level view of data, the alternatives for shortening the node list aren't great — both chunking into intermediate nodes and "display more" links seem more awkward to me from a usability standpoint.

    I did notice that this performance issue is now documented in MSDN, though, which is good.

  11. acq says:

    "Not sure what you mean by "the last inserted item independent of parent/child relationship"

    Now I understand that what is maintained internally with InsertItem is a single-linked list of all children of the same parent and that hInsert­After is in the context of that list. However, I just assumed that what I change is "a treeview" and hInsert­After is a handle to the element in a treeview, not necessarily a child. So now I'm not surprised that

    "a lot of programs pass garbage for hInsert­After"

    since I know that code I've maintained used the wrong assumption of giving the last inserted item *in a treeview* not the last inserted item of the given parent. I can imagine more people didn't understand what was supposed to be passed there. Documentation also notes that TVI_LAST is bad, but not that it's again only about the children, not about all the items in a treeview.

    Thanks a lot for this insight!

    [Given that a treeview is, well, a tree structure, it's not clear what "after" means when applied to a tree as a whole, since the nodes in a tree are not linear. Even if you mean "after in inorder traversal", that's not precise enough: Given the tree with A as the root, B as the child, C as grandchild, then inserting item D "after" C could result in D as a sibling of C or B; both would result in the same inorder traversal. The "after" terminology matches Set­Window­Pos's hwnd­Insert­After parameter, and also matches Tree­View_Get­Next­Item(TVGN_NEXT). -Raymond]
  12. Neil says:

    If you thought a large combobox was bad, then consider the case of a large popup menubutton – you don't have a scrollbar, nor any indication of the current item in the list. (Yes, this was an application badly ported from the Mac.)

  13. Mehrdad says:

    Cool post!

    But why is it _slower_ to destroy the window afterward, if you insert things backward?

    [The order of insertion should have no effect on destruction performance. The treeview doesn't remember what order you inserted them. -Raymond]
  14. Kelden says:

    In VB6, you can insert 50k items in about 1 second. But if you try to close the window, it takes about 25 seconds to destroy the window.

  15. Adrian says:

    When writing a backwards counting loop, you have to be careful if your loop index is unsigned.  As more programmers use STL containers, which use std::size_t for size(), it becomes more common for indexes to be unsigned types.  A downward counting loop is a tad safer written as:

    for (i = array.Count(); i > 0; ) {

     –i;

     // loop body here

    }

    This idiom works whether the index is signed or unsigned.

  16. KS says:

    @Raymond: You have only 6 Favorites? Good for you. The average user has a few hundred (not tens of thousands, of course), sometimes organized in several folder levels. Now imagine what happens in a combobox with all that expanded …

    [I have plenty of favorites. But the combo box in question doesn't show all the favorites; only the folders. And I organize my favorites into just 6 folders. If you created hundreds of folders to organize your favorites, then you've basically said "You know what, I like seeing hundreds of things at once – bring it on!" -Raymond]
  17. .dan.g. says:

    Thanks Raymond, I implemented your suggested solution (requiring changes to 6 lines of code) and whatever the speed benefits are it's got to be better than nothing!

  18. JA says:

    And this is why I am burning out on Windows programming, having to know minutia such as this in order to excel is getting old.

Comments are closed.