Excel Services UDF, be not synchronous

In a previous post, I show you how to create a UDF that resolves internet names by using a DNS lookup web service. In this post, we will discuss how we can make potential uses of this UDF even faster.

 

Continuing the example of the DNS look up, what if we wanted to look up many values all at the same time? We could use the function multiple times – but remember, it makes a web service call – that can potentially take a long time and doesn’t really have any significant usage of the CPU. Instead, we can create another function that will be used in the cases of multiple lookups. In this case, the UDF will return an Array which will be used by Excel Services just like any other array function. The parameter to our new function will be an array of names we want to look up. The signature will look like this:

 

public object[,] ResolveHostNameArray(object[,] names)

 

When passing array into UDFs, you always use object arrays (all other array types are unsupported). You can either use vectors (single dimension) or two-dimensional arrays – what you use will govern what types of calls can be made.

Vectors (single-dimension arrays)

If the range your UDF expects from Excel is a single row, you can use a vector as the parameter. The left-most item in the range passed will be the 0th item in the array. For example, a UDF method that needs to return the index of a string in a row of number:

 

[UdfMethod]

public int GetStringIndex(string find, object[] singleRow)

{

       for (int i = 0; i < singleRow.Length; i++)

       {

              if (find.Equals(singleRow[i].ToString(), StringComparison.InvariantCultureIgnoreCase))

              {

                     return i;

              }

       }

       return -1;

}

 

Calls made to this UDF with more than one row (even if there’s only one column) will fail – the UDF won’t even get executed.

Two-dimensional arrays

When you need a range that spans more than one row, you will need to use a two dimensional array (as described above, in the ResolveHostNameArray() signature).

The first index of the array is the row index, the second one will be the column index. The next table shows you a range of numbers passed in and the tuple of indices for each element in the range:

[0,0]

[0,1]

[1,0]

[1,1]

[2,0]

[2,1]

Back to our example

Now that we have the signature, we actually need to write the code for the method. As a reminder, this was the code in our prior Dns Lookup example:

 

[UdfMethod]

public string ResolveHostName(string name)

{

       string st;

       using (DnsLookup.DNSLookupService dns = new Blog_DNSUdf.DnsLookup.DNSLookupService())

       {

              st = dns.GetDNSInfoByWebAddress(name).IPAddress[0].ToString();

       }

       return st;

}

Simple Array Udf

The simple Array Udf will look similar to the single-lookup one:

 

[UdfMethod]

public object[,] ResolveHostNameArraySync(object[,] names)

{

       int nameCount = names.GetLength(0);

       object[,] result = new object[nameCount, 1];

       for (int i = 0; i < nameCount; i++)

       {

              using (DnsLookup.DNSLookupService dns = new Blog_DNSUdf.DnsLookup.DNSLookupService())

              {

                     result[i, 0] = dns.GetDNSInfoByWebAddress((string)names[i, 0]).IPAddress[0].ToString();

              }

       }

       return result;

}

 

As you can see, we are iterating the names array we got and for each value, we make a call to the web-service, asking for the IP address. We then take each address and place it in a result array which we then return to Excel Services.

This works and it’s all nice and dandy. The only problem is that it is highly inefficient. As explained before, calling Web-Services takes a relative long period of time while taking a relatively small amount of computing power. If a single DNS lookup call takes 400ms, making 10 of those calls serially will take 4 seconds and change. However, if you can make all those calls in parallel, the entire ordeal will take 400ms+. That means that this is a prime candidate for parallelization.

Parallel calls example

To call into the web-service multiple times, we will want to shoot off multiple asynchronous calls and then wait for all of them to get back to us. When the last one comes back, we will “free” our UDF method and let it return the combined array.

 

[UdfMethod]

public object[,] ResolveHostNameArray(object[,] names)

{

       if (System.Net.ServicePointManager.DefaultConnectionLimit < 20)

              System.Net.ServicePointManager.DefaultConnectionLimit = 20;

       int nameCount = names.GetLength(0);

       object[,] result = new object[nameCount, 1];

       int returnCount = nameCount;

       using (ManualResetEvent m_event = new ManualResetEvent(false))

       {

              for (int i = 0; i < nameCount; i++)

              {

                     DnsLookup.DNSLookupService dns = new Blog_DNSUdf.DnsLookup.DNSLookupService();

                     dns.GetDNSInfoByWebAddressCompleted +=
delegate(object sender, Blog_DNSUdf.DnsLookup.GetDNSInfoByWebAddressCompletedEventArgs e)

       {

       int index = (int)e.UserState;

       if (e.Error == null)

       {

       result[index, 0] = e.Result.IPAddress[0].ToString();

       }

       else

       {

       result[index, 0] = "";

       }

       Interlocked.Decrement(ref returnCount);

              if (returnCount == 0)

                     {

                           m_event.Set();

       }

       };

                     dns.GetDNSInfoByWebAddressAsync((string)names[i, 0], (object)i);

              }

              m_event.WaitOne();

              return result;

       }

}

 

Let us dissect the method and see what’s going on here.

  1. We create an event – we use it towards the end of the method to know when we are done processing.
  2. We make an async call – dns.GetDNSInfoByWebAddressAsync() – this call was generated for us by VS2005 and it will make a call to the GetDNSInfoByWebAddressCompleted() event when the call is complete and the data ready.
  3. Note that into the async call, we pass in the index where the result will need to reside – this index will be passed to the complete event.
  4. For the GetDNSInfoByWebAddressCompleted() event, we create an anonymous delegate. This delegate places the result in the correct location in the array.
  5. Each delegate decrements the returnCount local variable* - if it sees that it’s the last one (returnCount is zero), it will signal the event.
  6.  Finally, towards the end of the method, we wait for the event to signal. Once it does, we know all async calls have come back home and we can return the result.

 

You may have also noticed the first call I make to the ServicePointManager. By default, .NET will allow only two concurrent connections to the same server at the same time. In our case, we want more (otherwise, our method will again be blocked).

 

There are quite a few changes we could make so that this method would be safer and cleaner. For one, we could give the event a timeout so that if Something Bad happens, we don’t hang the server in a thread for eternity. Second, we could have a less sweeping change when using the ServicePointManager and only target the server we are actually using.

 

* C# is very crafty when it comes to anonymous functions – once the local variable is modified inside the anonymous delegate, it ceases to really be a local variable and turns into a member in an anonymous class. Luckily, C# takes care of all that goo for us, so we don’t actually need to, like, think hard, you know?