The geeky thrill of discovering that two things are really the same thing, just with different labels


Today’s post about binomial coefficients was intended to be a warm-up for Catalan numbers, but it turns out Eric Lippert already covered them, first in the context of binary trees, then in the context of arbitrary trees and forests, and then again in the context of matched parentheses. Another way of seeing the correspondence between forests and matched parentheses is simply to consider each { as an XML open-tag and each } as an XML end-tag.

One thing to take away from the enumeration of objects controlled by Catalan numbers is that when you see multiplication in a recurrence relation, that typically corresponds to a nested loop. (We saw this ourselves when we studied Stirling numbers of the second kind.)

The correspondence between binary trees and arbitrary forests is done by simply renaming variables: left­Child and right­Child turn into first­Child and next­Sibling.

Renaming variables also reveals an interesting equivalence between the two algorithms for reversing a linked list. One technique is to do link rewriting:

Node *Reverse(Node *head)
{
 Node *prev = nullptr;
 while (head) {
  // The node we are rewriting
  Node *current = head;

  // Advance to next node before
  // we overwrite the outbound pointer
  head = current->next;

  // Repoint to previous node
  current->next = prev;

  // Advance the trailing pointer
  prev = current;
 }
 return prev;
}

Another technique is to pop nodes off one list while pushing them onto another.

Node *Reverse(Node *head)
{
 Node *result = nullptr;
 while (head) {
  // Pop
  Node *current = head;
  head = current->next;

  // Push
  current->next = result;
  result = current;
 }
 return result;
}

But if you look more closely at the two versions, you’ll see that they are not really two algorithms. They are the same algorithm, just with different comments and variable names!

One of my colleagues used this as an interview question and guided candidates through both algorithms, only to discover later that they were actually the same algorithm, merely viewed through different-colored glasses.

Comments (15)
  1. Raidri says:

    The second link and the third link are the same.

  2. j3anders says:

    I've had Bresenham's line drawing algorithm pop up in all sorts of unexpected places. Most recently I think it was for discrete pulse-width modulation, and the same trick to make everything use integer math applied.

    @Raidiri, did you get a geeky thrill from discovering that the two links were really the same thing, just with different labels? :-)

    [Well-done. -Raymond]
  3. John Doe says:

    @Raymond, the second algorithm doesn't do what you specify ("Another technique is to pop nodes off one list while pushing them onto another."), it reuses the nodes, so effectively there is no other list. By its description, I'd expect the original list to remain intact.

    [The other list is the one the items being pushed onto. But if you disagree with my terminology, then feel free to substitute your own. The point is, it's being viewed as operations on two lists. -Raymond]
  4. Paul Topping says:

    We use this as an interview question also: "How do you represent an n-ary tree using a binary tree? Why would you want to do this?"  I am not sure I like using the term "arbitrary trees". Sounds too much like "random trees". Many candidates hear the word "tree" and think the question is about database searching. We stop them immediately and explain that it isn't. Some come up with an implementation in C++ that uses std::vector to hold pointers to child nodes without ever thinking about the likely implementation and performance characteristics of such an implementation.

  5. Joshua says:

    @Paul Topping: Ask me such a question would yield "What are you trying to do with an n-ary tree?"

  6. Evan says:

    @John Doe: "the second algorithm doesn't do what you specify ("Another technique is to pop nodes off one list while pushing them onto another."), it reuses the nodes, so effectively there is no other list. By its description, I'd expect the original list to remain intact."

    (1) The "effectively there is no other list" is kind of Raymond's point.

    (2) What does "pop nodes off one list" mean to you? Certainly doesn't sound like it's leaving the list intact to me…

  7. Paul Topping says:

    @Joshua: My answer would be, "Doesn't matter. N-ary trees occur in lots of models of the real world. While it is true that one might want to optimize the implementation for a specific use, the trade-offs involved in such a discussion are exactly the point of this interview question."

  8. Joshua says:

    Hmmm. Obviously not a search tree then. I'm used to general purpose N-ary trees being implemented as first-child next-sibling trees (my local experience says the N doesn't hold constant for long). While the node-net does indeed look like a binary tree the logical operation set doesn't. (Enum all children of a node is a common operation but decend-left returning all nodes in a binary tree isn't.)

    The std::vector is how I'd end up storing a DAG.

  9. Paul Topping says:

    The first-child next-sibling implementation choice is a good one because it avoids dynamic allocation of a child list within each node. That's what's implied by using vector to hold that list.

  10. Liquorice says:

    @Paul Topping: std::vector provides many methods and saves your coding time. I think two implementations have their advantages and disadvantages. Which is better depends on the actual situation, and obviously in some situations dynamic allocation is not a problem.

  11. Neil says:

    @Joshua: My answer would be that you, via your browser, are using one now.

  12. Paul Topping says:

    @Liquorice I disagree. While std::vector does provide many methods and saves your coding time, it is a poor choice for all uses of an n-ary tree. Perhaps for some throw-away code it would be acceptable but almost anything is acceptable in that situation. A vector will initially allocate space for some number of items and then re-allocate space if that count is exceeded. It wastes considerable memory space if the number of children of each tree node is much less than that allocation. If adding a node causes the list to be re-allocated, then it is relatively huge performance hit. The binary node implementation is far superior.

  13. Eric Lippert says:

    Thanks as always for the shout-out.

    When I first interviewed at Microsoft back in the 1990s one of the questions I was asked was "reverse a linked list in the language of your choice", so I wrote "newList = MakeEmptyLinkedList(); while (!list->Empty()) newList->Push(list->Pop()); list = newList;" and this completely flummoxed the interviewer for some reason. He seemed to expect that there would be more pointers in there somewhere. That was a very strange day for me.

  14. Ben Voigt says:

    @Paul: So in your estimation, throwing away locality and causing *every* addition to cause allocation is a performance improvement over sometimes needing to reallocate, but with amortized cost?

    The vector version may waste space, but time performance is quite often better.  Especially on reads.  (Also note the difference between storing children as a vector of pointers vs a pointer to a vector of objects)

  15. Paul Topping says:

    @Ben Voigt: I am confused by your comment. Modeling an n-ary tree using a binary tree does no allocation at all except for the node itself. There are no auxiliary data structures at all.

    Note that the vector implementation under discussion (by me, at least) is where each tree node contains a single member whose type is vector <*node>. Although std::vector implementers are free to implement it how ever they want, the typical implementation will consist of a pointer to an array of elements (node * in this case).

Comments are closed.