Every Binary Tree There Is


[This is the first part of a series on generating every string in a language. The next part is here.]

The other day I wrote a little algorithm that did some operation on binary trees. I wanted to test it. I whipped up a few little test cases and it seemed fine, but I wasn’t quite satisfied. I was pretty confident, but maybe there was some odd binary tree topology that I hadn’t considered which would cause a bug. I reasoned that there have got to be only a finite number of binary tree topologies of a given size. I’ll just try all of them.

Before I go on, I need a compact notation for a binary tree. Here’s my tree node:

class Node
{
    public Node Left { get; private set; }
    public Node Right { get; private set; }
    public Node(Node left, Node right)
    {
        this.Left = left;
        this.Right = right;
    }
}

Pretty straightforward –- left node, right node, and that’s it. Notice that for the sake of clarity of this article I’ve removed the data that would normally be stored in the tree node. Let’s assume that they’re numbers for now. The interesting bit here is how I represent the tree as an extremely compact string. A null tree node is represented as x. A nonempty tree node in my “valueless” tree is represented as (left right). Consider this tree:

  1
/ \
x   2
   / \
  x   x

The 2 node has two null children, so it is  (xx). The 1 node has one null child on the left and the 2 node on the right, so it is (x(xx)). Make sense? We can write a bit of recursive code that produces such a string:

public static string BinaryTreeString(Node node)
{
    var sb = new StringBuilder();
    Action<Node> f = null;
    f = n =>
    {
        if (n == null)
            sb.Append(“x”);
        else
        {
            sb.Append(“(“);
            f(n.Left);
            f(n.Right);
            sb.Append(“)”);
        }
    };
    f(node);
    return sb.ToString();
}

How can we enumerate all possible binary trees of a given size? I reasoned recursively:

There is one tree with zero non-null nodes in it: x. That’s our base case.

Now, pick a number. Four. We want to enumerate all the trees with four non-null nodes in them. Suppose we already know how to enumerate all the trees with three, two, one and zero nodes in them. Let’s call the set of binary trees with n nodes in it B(n). Suppose we make all possible combinations of B(x) and B(y) such that a member of B(x) is to the left of the root and a member of B(y) is to the right. I’ll write that B(x)B(y). In this notation the trees with four non-null nodes in them have to be in one of these four possible sets: B(0)B(3), B(1)B(2), B(2)B(1), or B(3)B(0).

Obviously this generalizes; we can enumerate all the trees with k nodes in them by going through each of k cases, where each case requires us to solve the problem on some tree sizes smaller than k. Perfect for a recursive solution. Here’s the code.

static IEnumerable<Node> AllBinaryTrees(int size)
{
    if (size == 0)
        return new Node[] { null };
    return from i in Enumerable.Range(0, size)
           from left in AllBinaryTrees(i)
           from right in AllBinaryTrees(size – 1 – i)
           select new Node(left, right);
}

Once more, LINQ makes an algorithm read much more like its specification than the equivalent program with lots of nested for loops.

And sure enough, if we run

foreach (var t in AllBinaryTrees(4))
    Console.WriteLine(BinaryTreeString(t));

we get all fourteen trees with four non-null nodes:

(x(x(x(xx))))
(x(x((xx)x)))
(x((xx)(xx)))
(x((x(xx))x))
(x(((xx)x)x))
((xx)(x(xx)))
((xx)((xx)x))
((x(xx))(xx))
(((xx)x)(xx))
((x(x(xx)))x)
((x((xx)x))x)
(((xx)(xx))x)
(((x(xx))x)x)
((((xx)x)x)x)

And now I have a device which generates tree topologies that I can use to test my algorithm.

How many such trees are there, anyway? Seems like there might be rather a lot of them.

The number of binary trees with n nodes is given by the Catalan numbers, which have many interesting properties. The nth Catalan number is determined by the formula (2n)! / (n+1)!n!, which grows exponentially. (See Wikipedia for several proofs that this is a closed form for the Catalan numbers.) The number of binary trees of a given size is:

0 1
1 1
2 2
4 14
8 1430
12 208012
16 35357670

So perhaps my plan to try all binary trees of a particular size is not a great one; by the time you get into the mid-teens there are way too many to reasonably try all of them in a short amount of time. Still, I think this is a handy device to have around.

A challenge for you: Suppose we forget about binary trees for a moment and consider arbitrary trees. An arbitrary tree node can have zero, one, or any finite number of child tree nodes. Let’s say that a non-empty arbitrary tree is notated as a list of child trees in braces. So {{}{}{{}}} is the tree

    1
   /|\
  2 3 4
      |
      5

Because the 2, 3 and 5 nodes each have no children, they are notated as {}. Make sense? Note that order matters; {{}{}{{}}} and {{}{{}}{}} are different trees even though they have a similar structure.

My challenge questions are (1) are there more or fewer arbitrary trees with n nodes than there are binary trees with n nodes? and (2) can you come up with a mechanism for enumerating all the arbitrary trees?

[This is the first part of a series on generating every string in a language. The next part is here.]

Comments (47)

  1. Adam V says:

    For #1, doesn’t it have to be "equal or greater" because every binary tree can also be represented as an arbitrary tree?

  2. Brian says:

    @Szindbad: I’m not so sure you did.  You *could*, if you wanted, use a depth-first search.  Increment the count by 1 at each iteration, stop (pretend it’s a dead-end, but append to your results) if count hits 0.  Treat each node as having infinite branches (but you end up stopping instead of creating a branch if count hits 0.  Of course, I think you could also mimic the recursive strategy Eric came up with.

  3. Joey says:

    Pretty cool! Thanks for sharing!

  4. Orion Adrian says:

    There are more binary trees of n nodes because binary trees also include positional data (left versus right) while an arbitrary tree doesn’t.

  5. barrkel says:

    Not the question asked, but a related problem that I once encountered: how do you enumerate all possible arithmetic expressions involving a set of numbers and a set of operators. If you’re familiar with the Countdown numbers game (or Google that expression), you know what I mean.

    Arithmetic expressions with binary operators can be treated as binary trees. A brute force approach (remember, when in doubt, use brute force) would be to enumerate every possible binary tree, and then enumerate every permutation of numbers and operators. But this won’t be fast, as you can imagine.

    A different way to approach it is to get rid of the explicit tree altogether, and use RPN instead. That way, you can permute RPN programs: write a recursive routine which alternately tries adding each operator to the program (if there are more than two items on the stack), and tries adding each number to the program, and recurses after each. When it’s run out of numbers, you’ve got a solution – you can evaluate the RPN program as you go, or at the leaf of the recursion, as you desire.

  6. Robert Davis says:

    Maybe I’m missing something, but this tree {{}{}{}} is an arbitrary tree of 4 nodes that is not included in the set of binary trees with 4 nodes. Wouldn’t that pretty significantly prove that there are more arbitrary trees than binary trees? Still thinking about number 2.

  7. Robert Davis says:

    Nevermind, Orion Adrian makes the correct point. At 4 nodes, the complete set of 4 node arbitrary trees is

    {{{{}}}}

    {{{}{}}}

    {{{}}{}}

    {{}{{}}}

    {{}{}{}}

    which is less than the fourteen trees mentioned in the blog.

  8. DMO says:

    For (2) I think it  can be solved by thinking about adding a node at any level between the current level and the root level.  Here is some code to do it for the string notation:

           static void EnumerateTrees()

           {

               var trees = EnumTree(String.Empty, 0, 4);

               foreach (var tree in trees)

               {

                   Console.WriteLine(tree);

               }

           }

           static IEnumerable<string> EnumTree(string currentString, int currentLevel, int remainingNodes)

           {

               if( remainingNodes == 0) {

                   return new string[] { currentString + String.Empty.PadLeft(currentLevel,’}’) }; // close the last child added

               }

               return from levelToAdd in Enumerable.Range(0, currentLevel+1)

                      from tree in EnumTree(AddChild(currentString, currentLevel, levelToAdd), levelToAdd+1, remainingNodes – 1)

                      select tree;

           }

           static string AddChild(string currentString, int currentLevel, int newChildLevel)

           {

               if (currentLevel == newChildLevel)

                   return currentString + "{";

               else

                   return AddChild(currentString + "}", currentLevel – 1, newChildLevel);

           }

  9. DMO says:

    For (1) – the discrepancy is due to the binary trees each having two "slots" so there are null nodes which count as additional permutations: (x)x != x(x) whereas the arbitrary case, the number of slots is equal to the number of children nodes.  It seems that Binary(n-1).Count = Arbitrary(n).Count.  I don’t have a feel yet for why this is so.

  10. Philo says:

    Orion Adrian: I’m not sure what the difference between positional information and order matters is.

    From the blog post: "Note that order matters; {{}{}{{}}} and {{}{{}}{}} are different trees even though they have a similar structure."

  11. Jonathan says:

    There is only one tree with 1 node: {}.

    To combine non-empty tree A (with x nodes) with non-empty tree B (with y nodes) to get a tree with x+y nodes, append the root of A to the child list of the root of B.

    So, to enumerate the trees with n nodes, for each k between 1 and (n-1), for each tree A with k nodes, for each tree B with (n-k) nodes, add the combination of A and B to the enumeration.

  12. Banjobeni says:

    Thinking about enumerating the trees, I asked myself if there is a closed form of the number of trees having n nodes. I found a surprisingly easy recursive solution, but could not yet resolve it to a closed form (maybe someone can try?):

    Let A_n be the number of trees having n+1 nodes. Then,

    A_0 := 1

    A_n = Sum(i,1,n,A_(i-1) * A_(n-i))

    So,

    2 Nodes: A_1 = A_0 * A_0 = 1

    3 Nodes: A_2 = A_0 * A_1 + A_1 * A_0 = 2

    4 Nodes: A_3 = A_0 * A_2 + A_1 * A_1 + A_2 * A_0 = 5

    5 Nodes: A_4 = A_0 * A_3 + A_1 * A_2 + A_2 * A_1 + A_3 * A_0 = 14

    6 Nodes: A_5 = … = 42

    7 Nodes: A_6 = … = 132

    And so on. You can see this as follows:

    - Every tree, say it has n nodes, has a root node. If you take it away, the remaining structure, called an ordered forest, has (n-1) nodes.

    - Now, how many ordered forests are there for n nodes? We can solve this recursively. Say the first tree has i nodes, 1 < i <= n. Take this first tree away. Remaining are n-i nodes which form again an ordered forest. Now let’s iterate i from 1 to n and sum up the results:

    A_n =

           (#trees with 1 node) * (#forests with n-1 nodes) +

           (#trees with 2 nodes) * (#forests with n-2 nodes) +

           (#trees with 3 nodes) * (#forests with n-3 nodes) +

    … +

           (#trees with n-1 nodes) * (#forests with 1 node)

    So all we have to do is determine two things:

    - (#trees with x nodes): We can again take the root node of every tree away, giving us an ordered forest. This is, by recursion, equal to A_(x-1)

    - (#forests with x nodes): This is easy, since we already have a notion for it: A_x

    You can substitue these results and form the sum(var,from,to,what)-expression, yielding the recursive form above.

    I hope this is understandable to at least some degre… Otherwise, let me now.

    I will probably paste some (short) enumeration code later.

  13. Random832 says:

    Do I get points for thinking ‘outside the tree’ if I write a program to simply enumerate all possible balanced bracket strings consisting of exactly N pairs of brackets?

    (start with the string "{" – at any given point you may add a { if there are less than N {‘s, and may add a } if there are fewer }’s than {‘s (minus one so you don’t have multiple roots), then at the end add a })

    I started writing the program thinking of it in terms of trees (first enumerating all the possibilities for how many nodes are contained in each subtree, then enumerating all the possible shapes of a subtree of that size for each of those), then realized – the strings are all the same length, a little thought tells me there’s one { and one } per node, and anyway doesn’t this notation look an awful lot like nested sets…

    Philo: (x(xx)) and ((xx)x) are two different binary trees that are both a single root node with a single child node – i.e. {{}}.

  14. DMO says:

    It seems that the number of trees is equal to the number of trees in a binary tree with one fewer node.  So the closed form would be: (2n-2)! / (n)!(n-1)!

  15. DMO says:

    Random832: That is how I started out thinking about it.  I realized that picking how many closures to make is equivalent to selecting the level on which to add the next node.

  16. Joe Darcy says:

    On related topics. for labeled trees the number of trees of n nodes is given by Cayley’s formula, n^(n-2).  Since this trees are labeled, this results distinguished the many isomorphic trees from each other.  One way to prove Cayley’s formula is to use the Prüfer code construction of a tree (http://en.wikipedia.org/wiki/Pr%C3%BCfer_sequence).  The Prüfer code also easily allows a random tree to be generated.

  17. Pop Catalin says:

    You made me realize that I haven’t had the chance to solve a problem using backtracking in a very long time. I used a very basic backtracking implementation. I will try to LINQify it more in the future to see what I can come up with.

    Here’s my solution for #2:

       class Program

       {

           static void Main(string[] args)

           {

               foreach (var solution in GenerateTrees(4))

                   Console.WriteLine(solution);

           }

           public static IEnumerable<String> GenerateTrees(int n)

           {

               if (n == 1) { yield return "{}"; yield break; }

               n–;

               StringBuilder solution = new StringBuilder(new string(‘ ‘, n * 2));

               string values = "{}";

               int k = 0;

               do

               {

                   bool found = false;

                   while (GetNext(solution, k, values))

                   {

                       if (IsValid(solution.ToString(0, k + 1), n))

                       {

                           if (k == n * 2 – 1)

                           {

                               yield return "{" + solution.ToString() + "}";

                               k–;

                           }

                           else

                           {

                               k++;

                           }

                           found = true;

                           break;

                       }

                   }

                   if (!found)

                   {

                       for (int i = k; i < solution.Length; i++)

                           solution[i] = ‘ ‘;

                       k–;

                   }

               } while (k >= 0);

           }

           private static bool GetNext(StringBuilder solution, int k, string values)

           {

               int index = values.IndexOf(solution[k]);

               if (index < values.Length – 1)

               {

                   solution[k] = values[index + 1];

                   return true;

               }

               return false;

           }

           private static bool IsValid(string partialSolution, int n)

           {

               int count = 0;

               for (int i = 0; i < partialSolution.Length; i++)

               {

                   count += partialSolution[i] == ‘{‘ ? 1 : -1;

                   if (count < 0) return false; // At no point in the sequence the count of "{" must be greater than the count o "}"

               }

               return count <= (n * 2 – partialSolution.Length); // Is valid only if the solution is well balanced

           }

       }

    The algorithm is optimized a bit in the sense that searches only the inner nodes, the first and last characters in the solution always need to be "{" repectivelly "}".

  18. Pop Catalin says:

    My last comment doesn’t make sense, what I meant to say is that first brace is always matched by the last one on every solution.

  19. Andrey Titov says:

    There exists a homomorphism between binary trees and arbitrary trees, so all that you can say about binary trees is applicable to arbitrary trees and vise versa. For example this homomorphism can be established in the following way: for any node of arbitrary tree we map the leftmost child to the left child of the corresponding node of  the binary tree and map the right sibling to the right child. Actually we just transform children of particular node from an array to a forward-only linked list.

    So the answer is (1) there is as many arbitrary trees as binary trees and (2) we can use existing algorithm for enumerating binary trees and just append .Select(bTree => bTree.ToArbitrary()).

  20. Phil Koop says:

    DMO is spot on: the number of configurations of a general tree is the same as the number of configurations of a binary tree with one fewer node.

    You can work this out from the standard scheme for encoding a general tree as a binary tree (see http://en.wikipedia.org/wiki/Binary_tree#Encoding_general_trees_as_binary_trees), which could be summarized as "nodes descended from me on the left are descendants, and nodes descended from me on the right are either siblings or descendants of siblings."

    The wikipedia note correctly states that there is a 1:1 mapping but gives only the case from general to binary. To go from binary to general, you need to add a dummy "super-node" to handle the possibility that the root node has a right child. Hence the n-1 rule described by DMO.

    Disclaimer: I did not work this out from scratch but retained some vague recollection of the general-to-binary mapping from my school days, now a quarter century in the past. Oh, to be young again!

  21. Phil Koop says:

    Oops – Andrey Titov’s post was not visible to me when I made mine. But now that I can see it, I still think that Andrey is off by one.

  22. DMO says:

    Phil Koop: Thanks – that’s exactly what I was looking for!  I knew there had to be an intuitive argument for the relationship.

  23. Wow, I can’t believe no one has come up with a simple LINQy solution yet. Here’s what I did:

    public static IEnumerable<Node> AllTrees(int size) {

    if (size == 0) {

    throw new ArgumentException("size must not be 0");

    } else if(size == 1) {

    return new Node[] { new Node(new Node[0]) };

    } else {

    return from firstChildSize in Enumerable.Range(1, size – 1)

    from firstChild in AllTrees(firstChildSize)

    from rest in AllTrees(size – firstChildSize)

    select new Node(Enumerable.Concat(new Node[] { firstChild }, rest.Children));

    }

    }

    It’s a bit of a hack — a nicer solution would be to make it return an IEnumerable<IEnumerable<Node>> that represents the children of the node — but this is what I came up with at first.

  24. Svick says:

    My solution to (2) is almost the same as what is in the original post for binary trees: Children of each node are in a linked list, so each node has two members – firstChild and nextSibling (as Phil Koop described above):

    class Node

    {

    public Node FirstChild { get; protected set; }

    public Node NextSibling { get; protected set; }

    public Node(Node firstChild, Node nextSibling)

    {

    FirstChild = firstChild;

    NextSibling = nextSibling;

    }

    static IEnumerable<Node> treeSets(int n)

    {

    if (n == 0)

    return new Node[] { null };

    else

    return from i in Enumerable.Range(0, n)

    from firstChild in treeSets(i)

    from nextSibling in treeSets(n – 1 – i)

    select new Node(firstChild, nextSibling);

    }

    public static IEnumerable<Node> Trees(int n)

    {

    return treeSets(n – 1).Select(node => new Node(node, null));

    }

    public override string ToString()

    {

    StringBuilder sb = new StringBuilder();

    Action<Node> f = null;

    f = node =>

    {

    if (node == null)

    return;

    sb.Append(‘{‘);

    f(node.FirstChild);

    sb.Append(‘}’);

    f(node.NextSibling);

    };

    f(this);

    return sb.ToString();

    }

    }

    BTW, i wondered quite a while while is the following code in the original post, and i don’t like it that it actually has to be written this way:

    Action<Node> f = null;

    f = n => { … };

  25. Mark Knell says:

    @Phil Koop

    I’ll side with Andrey on this.  DMO and you are close, but I think he gets the cigar.  

    In fact, I think your first post demonstrates why. Every labeled tree whose nodes are ordered can be uniquely transformed to a binary tree, using (for example) the methods in the links you provided.  And, the transformation is reversible; you sketched out a constructive proof of why.  

    Adam V was correct out of the gates; every binary tree is an arbitrary tree, so it doesn’t make sense to think the arbitrary-tree number would be smaller than the binary.

    Maybe a bit of intuition is available from a language coincidence that isn’t purely coincidental.  When Eric stipulated that order is important, that was equivalent to stating that there’s an operation that can determine the relative order of any two nodes in the tree.  An operation on two inputs is a "binary" operation; it uniquely determines a "binary" tree–which is good news.  If this sort of thing weren’t deterministic and reversible, several important strategies for efficiently indexing data would fly out the window…  

  26. DMO says:

    Mark Knell: Phil Koop’s point was that the requirement that the arbitrary tree have a single root is equivalent to adding a "super-node" at the root of the binary mapping of the arbitrary tree with the mapped tree as its left "decedent" branch and a null right "sibling" branch.  This accounts for the extra node.  If the single root requirement is relaxed, they are identical.

  27. Phil Koop says:

    Mark Knell, I can’t fault you for not trusting my reasoning – that’s only prudent! But you can be quite sure that the number of binary trees is not the same as the number of general trees just by checking the first few cases.

    For instance, there are two binary trees with two nodes, but only one general: {{}}. There are five binary trees with three nodes, but only two general: {{}{}} and {{{}}}. Etc.

  28. Banjobeni says:

    Here the promised (short and i think very nice) algorithm to enumerate the strings based on my recursive explanations above.

       class Program

       {

           static void Main(string[] args)

           {

               foreach (String s in EnumerateTrees(10)) {

                   Console.WriteLine("{" + s + "}");

               }

               Console.ReadLine();

           }

           static IEnumerable<String> EnumerateTrees(int nodeCount)

           {

               if (nodeCount == 0) {

                   yield return "";

               } else {

                   for (int i = 1; i <= nodeCount; i++) {

                       foreach (String sa in EnumerateTrees(i – 1)) {

                           foreach (String sb in EnumerateTrees(nodeCount – i)) {

                               yield return "{" + sa + "}" + sb;

                           }

                       }

                   }

               }

           }

       }

    Note that the actual algorithm itself is only one method – an if and three nested for/foreach. I am pretty sure one could optimize the hell out of it and maybe even abuse LINQ, but in its essence it’s just this.

    Thanks for sharing this exercise with us Eric, let’s me refresh some math part of my brain.

  29. Michael McMullen says:

    @Mark Kenll

    You said: "every binary tree is an arbitrary tree, so it doesn’t make sense to think the arbitrary-tree number would be smaller than the binary."

    But you can have two or more binary trees that map to the same arbitrary tree.

    For example, look at these two 4 node binary trees:

    (x(x(x(xx))))

    (x(x((xx)x)))

    To map them to arbitrary trees, you basically just remove the x’s and change the ()’s to {}’s which gives:

    {{{{}}}}

    {{{{}}}}

    So these are the exact same arbitrary tree.

    Now take a look at the wikipedia article on Catalan numbers that Eric linked in the original post, and look for the part about Dyck words.  That’s almost exactly what we’re looking at here, except that it allows multiple root nodes.  To get around that part, we can just add an extra node at the top level to be the parent of all of those nodes which gives Cn = the number of arbitrary trees with (n+1) nodes (because we added an extra one).  So the number of arbitrary trees with n nodes is C(n-1).

    My solution for (2) was fairly similar to Daniel Ziegler’s so I won’t bother posting it here.

  30. DRBlaise says:

    @Eric – I have a question about the recursive AllBinaryTrees method.  I love the readability of it and I know this method was not written for speed, but would it suffer from the speed issues associated with recursive IEnumerable calls?  I remember from your posts about BinaryTrees; you said not to use recursion for the InOrder traversial method.  Is this still an issue when using LINQ?

  31. Mark Knell says:

    @Phil Koop, Michael McMullen

    I think I took a different set of assumptions out of Eric’s "order matters" wording than you did.  I read him to be implying that the nodes are well-ordered, something a bit like pre-existing labels, but now I wonder if that was the problem he was stating.  In my version, there are indeed two trees on two nodes: 1-2  and 2-1.  And so forth for higher numbers.  And I wouldn’t have agreed that Michael McMullen’s trees were the same trees, just two trees with the same bracket notation.  The bracket notation is a lossless representation of binary  tree topology but not of arbitrary tree topology.

    Thus, I didn’t conclude that when he asked, "are there more or fewer arbitrary trees" that he meant: count the number of distinct bracket notations–the equivalence classes "modulo" the notation.  But it’s a reasonable interpretation.  The more I think about it, it’s persuading me.

  32. izobr says:

    Ah, yes. Guys right pointed that in general case binary tree is mapped to arbitrary tree with multiple siblings on the top level, or having an extra virtual root node. So the single root node of an arbitrary tree is mapped to the root node of a binary tree having no right child. So with this restriction there are as many arbitrary trees with single root as there are binary trees having no right child on the root node and as many as arbitrary binary trees with one less node.

  33. runefs says:

    well be for I got to the keys the LINQifieed solution to two was already out there so now reason to post it again. The solution to 1 is (to me at least) not intuitive but there’d be fewer arbitrary trees since

    1      and 1        would both simply be 1 in an arbitrary tree representation.

    |            |                                        |

    2 x          x 2                                       2

    Every other tree is a combination of those and the single node tree which has the same representation in binary and arbitrary "form".

    There are way too many binary tree that when represented as arbitrary trees become the same representation for any non-binary tree combis to make up for that

  34. Banjobeni says:

    I decided to give LINQ a try, therefore refining my solution above to:

           static IEnumerable<String> EnumerateForests(int n)

           {

               if (n == 0) {

                   return new String[] { "" };

               } else {

                   return from i in Enumerable.Range(1, n)

                          from a in EnumerateForests(i – 1)

                          from b in EnumerateForests(n – i)

                          select "{" + a + "}" + b;

               }

           }

    This looks *surprisingly similar* to the solution Eric presented for binary trees, doesn’t it?

  35. FedeAzzato says:

    For (1), there are more arbitrary trees than binary trees with a specific node count. The proof is that any binary tree is also an arbitrary tree, and for example "{{}{}{}}" is in the set of all arbitrary trees with 4 nodes, but it isn’t in the set of all binary trees with 4 nodes.

    For (2) you can think of an arbitrary tree as a node that has pointers to it’s first child, and it’s next sibling, and then you can just use your current algorithm.

  36. FedeAzzato says:

    Giving a second thought to (1), I believe my previous answer was wrong, and the key is in the representation for arbitrary trees chosen in (2).

    Any arbitrary tree can be thought as a binary tree which has pointers to its first child, and it’s next sibling. Of course the root of the tree will never have any sibling, therefor the number of arbitrary trees with a given size is:

    0 0

    1 1

    n 1 + the number of binary trees with size (n – 1)

  37. Tim Goodman says:

    Positional data matters for both (Eric says the order matters), I think the reason you can get fewer arbitrary trees with n nodes is that for arbitrary trees Eric includes nodes with zero children, whereas for binary trees we only count non-null nodes (those with two children).

    So there are only 5 arbitrary trees with 4 nodes (Robert Davis gives them above), whereas there are 14 binary trees with 4 non-null nodes (as Eric shows).

  38. FedeAzzato says:

    Wrong again. My last statement says that for a given n the number of arbitrary trees of size n is equal to 1 plus the number of binary trees of size (n – 1). In fact the number of arbitrary trees of size n is equal to the number of binary trees of size (n – 1) for any n greater than 0.

    Nice exercise :)

  39. Jimmy says:

    I can’t help but wonder if this was inspired by http://msdn.microsoft.com/en-us/vcsharp/ee957404.aspx . I nearly spilled my coffee when I saw that pop up in my VS feed.

  40. DRBlaise says:

    @Eric-"LINQ makes an algorithm read much more like its specification than the equivalent program with lots of nested for loops."

    This may be the case for some specifications, especially if there are several "where" clauses, but for this particular problem, I think traditional for and foreach statements are better.  I find the LINQ Range method unintuitive. Check out the code below:  I believe the for statement makes the intent much clearer.  Also the yield return null is clearer than the original.

    static IEnumerable<Node> AllBinaryTrees(int size)

    {

       if (size == 0)

       {

           yield return null;

           yield break;

       }

       for (int leftSize = 0, rightSize = size – 1; leftSize < size; ++leftSize, –rightSize)

           foreach (Node left in AllBinaryTrees(leftSize))

               foreach (Node right in AllBinaryTrees(rightSize))

                   yield return new Node(left, right);

    }

  41. Carl D says:

    @Jimmy – I remember writing a BASIC program over 30 years ago to solve that problem.  It ran for several minutes on a TRS-80 to find the one and only answer.  

    The LINQ solution you linked to is truly an example of the most heinous abuse of LINQ that I can imagine!

  42. DRBlaise says:

    @Carl D – "…most heinous abuse of LINQ that I can imagine!"

    I hope you are joking.  I thought it was a great example of the power of LINQ.  The code reads exactly how a person would think about solving the problem using the definition of the problem.  What could be better?

  43. Adam V says:

    @Carl D:

    Notice two statements at the bottom of the post:

    - "no attempt at all has been made to optimize the code"

    - "On my laptop, it finds the only solution to the problem… in much less than a second"

    Given these two statements (particularly the second), I find this to be a very reasonable code snippet. If it were being used in production code… I’d wonder what the heck kind of business I was running, and then I’d make the obvious modifications to speed it up. But in terms of "Eric has a math trivia question for everyone", this code is perfect.

  44. Sema says:

    In our notion there are more binary trees, because symbol x acts like an additional node.

    On the second question:

    Let A(n) – set of arbitrary trees forests with a n nodes in our notation.

    A(0) = empty string

    Then A(n) = Union for (i = 0..n – 1) ({A(n – 1 – i)}A(i));

    Lets check it:

    A(1) = {}

    A(2) = {}{}, {{}}

    A(3) = {}{}{}, {}{{}}, {{}}{}, {{{}}}

    and so on.

    Set of arbitrary trees with n nodes = {A(n-1)} – i. g. adding a root for the forest.

    That’s all.  

  45. jsrfc58 says:

    Eric, you lost me here:

    “A null tree node is represented as x.”

    …followed later on by this statement…

    “There is one tree with zero non-null nodes in it: x. That’s our base case.”

    These seem to be contradictory statements. What does x represent?

    They are not contradictory, though the triple-negative of “zero non null” is hard to parse mentally. Let’s go through it. “x” represents a null tree node. Parens represent a non-null tree node. So “(xx)” would be a tree with one non-null node and two null nodes. Now, what are the trees with zero non-null nodes? The trees with zero non-null nodes are the trees where every node in them is a null node. There is only one tree that contains only null nodes, and that’s the tree that consists of a single null node. We represent that as “x”. Is that now clear? — Eric

  46. jsrfc58 says:

    Eric wrote: "the triple-negative of "zero non null" is hard to parse mentally."

    Thanks, I think that was the issue.

  47. v says:

    when we talk about number of binary trees from 'n' nodes….why is it Catalans number..this answer should be in the case of unlabeled binary tress.

    and in the case of labelled binary tree…n! x Catalans number…