TPL Dataflow and async/await vs CCR - part 5

I got a tip from a co-worker that your choice of waiting for a scatter gather operation as described in part 3 may have a huge performance impact. I made three version of the FibonacciAsync, one that awaits on the return statement, one that uses the Task.WaitAll method and one that just awaits each call as it is made. There are some interesting results when I introduced a one second delay in the asynchronous fibonacci. Here is the code I used:

  1: namespace AsyncTests
 2: {
 3:     [TestClass]
 4:     public class SlowFibonacci
 5:     {
 6:         private async Task<int> FibonacciAsync1(int n)
 7:         {
 8:             await Task.Delay(TimeSpan.FromSeconds(1));
 9:  
 10:             if (n <= 1)
 11:                 return n;
 12:  
 13:             var n1 = FibonacciAsync1(n - 1);
 14:             var n2 = FibonacciAsync1(n - 2);
 15:  
 16:             return await n1 + await n2;
 17:         }
 18:  
 19:         private async Task<int> FibonacciAsync2(int n)
 20:         {
 21:             await Task.Delay(TimeSpan.FromSeconds(1));
 22:  
 23:             if (n <= 1)
 24:                 return n;
 25:  
 26:             var n1 = FibonacciAsync2(n - 1);
 27:             var n2 = FibonacciAsync2(n - 2);
 28:             Task.WaitAll(n1, n2);
 29:             return n1.Result + n2.Result;
 30:         }
 31:  
 32:         private async Task<int> FibonacciAsync3(int n)
 33:         {
 34:             await Task.Delay(TimeSpan.FromSeconds(1));
 35:  
 36:             if (n <= 1)
 37:                 return n;
 38:  
 39:             var n1 = await FibonacciAsync3(n - 1);
 40:             var n2 = await FibonacciAsync3(n - 2);
 41:  
 42:             return n1 + n2;
 43:         }
 44:         
 45:         [TestMethod]
 46:         public void TestFibonacciAsync1()
 47:         {
 48:             var n = FibonacciAsync1(6);
 49:             n.Wait();
 50:             Assert.AreEqual(8, n.Result);
 51:         }
 52:  
 53:         [TestMethod]
 54:         public void TestFibonacciAsync2()
 55:         {
 56:             var n = FibonacciAsync2(6);
 57:             n.Wait();
 58:             Assert.AreEqual(8, n.Result);
 59:         }
 60:  
 61:         [TestMethod]
 62:         public void TestFibonacciAsync3()
 63:         {
 64:             var n = FibonacciAsync3(6);
 65:             n.Wait();
 66:             Assert.AreEqual(8, n.Result);
 67:         }
 68:     }
 69: }

And the results look like this:

Turns out that only the first version completes in six seconds as I expected. Even the similar Task.WaitAll call which I expect to be equalient takes significantly longer to complete. Awaiting each asynchronous call as it is being made is obviously a bad idea when a scatter/gather operation is more suitable.