Writing Custom Excel Worksheet Functions in C#

Writing our own, custom worksheet functions is a great way to extend Excel. Before Excel 2002 we developed custom functions by either writing them using VBA inside an XLA or by using C/C++ inside an XLL. Excel 2002 introduced a new type of addin called an automation addin. An automation addin enables Excel to call functions on COM objects from cells on a worksheet. In other words, it enabled us to call COM functions just as if they were normal, built-in Excel functions.

This opened up the world of custom worksheet functions to COM developers. So, now we can now write custom functions in any COM language; including C#.

In this article we'll write a custom worksheet function in C# called UNIQUEVALUES. 

What will our Custom Function Do?

The UNIQUEVALUES function will return an array of the unique values in a specified range. For example, if the range A1:A5 contains the values Red, Yellow, Red, Blue, Yellow, then {=UNIQUEVALUES(A1:A5)} will return the array {Red, Yellow, Blue}.

The function will also be able to cope with a mixture of numerical data and text data. So, if A1:A5 contains the values: Red, 1, 1, Red then {=UNIQUEVALUES(A1:A5)} will return the array {Red, 1}.

Because UNIQUEVALUES is a worksheet function that references the source data, whenever the values in the source range change the function automatically extracts the new unique values from the range. So we know that the values returned from the function are always up-to-date (once the worksheet has calculated).

Notice that the UNIQUEVALUES function returns an array. This means that we array enter the funtion by pressing Ctrl-Shift-Enter rather than just Enter when we type the function into the worksheet.

Coding the UNIQUEVALUES Function 

There are two tasks we need to do in order to make a C# class's methods be callable as worksheet functions:

  • Define our worksheet functions in the way Excel expects;
  • Add the Programmable key to the registry for our class's CLSID;

The first task seems intriguing: Define our worksheet functions in the way Excel expects. What does that mean?

Well, this is where we have to remember that Excel is speaking to our automation addin via COM. So Excel expects to be able to use COM techniques to discover and access our functions. Once we know what these COM techniques are, we can make sure that our C# class interoperates with COM in the appropriate way.

When an automation addin is loaded, Excel needs to discover which functions that addin supports. For example, our automation addin will have a UNIQUEVALUES function that we want to use in worksheets. How does Excel discover that our addin supports that function and how does it discover information about the parameters of the function?

Excel discovers this information using a COM interface called ITypeInfo.

ITypeInfo is an interface that is used to access metadata about a COM type. It is similar in concept to .NET reflection, but is very different in implemenation.

Excel uses ITypeInfo to discover the names and the parameter details of the functions exposed by our automation addin. This is how Excel knows to call our addin whenever it comes across a call to UNIQUEVALUES.   

But Excel will only query the default interface of our COM class. So we need to make sure that the functions we want to expose as worksheet functions are defined on the default interface of our class.

There are two ways of doing this in C#:

  • By specifying the ClassInterfaceType.AutoDual attribute on our C# class;
  • By defining our worksheet functions on a dedicated interface and using the ComDefaultInterface attribute to make that interface the default interface of our class;

To understand what the ClassInterfaceType.AutoDual attribute value does to our COM interface, we need to understand a little more about COM interfaces. 

In COM, the IDispatch interface enables clients to dynamically call your functions at runtime. Instead of the compiler checking that functions exist and that the parameters types match at compile-time, IDispatch enables clients to do so at runtime. This enables components to be extremely loosely coupled. This type of function call is termed: late-bound.

Support of IDispatch is optional. COM classes do not have to support it. However, by default, C# classes support IDispatch. In fact (unless we specify otherwise) the default interface created for a C# class is a dispatch interface. 

Why do we need to care about this? Well, since the default interface is a dispatch interface our worksheet functions are not explicitly defined on the interface and are therefore not discoverable by Excel. So, even though our C# class may have a public function named UNIQUEVALUES Excel would not be able to call it.

A dual interface allows a dispatch interface to explicitly define custom functions in addition to those already defined by IDispatch. The ClassInterfaceType.AutoDual attribute turns our COM interface into a dual interface. This effectively takes all public functions defined on our class (including those inherited from base classes) and explicitly defines them on our COM interface which in turn makes them discoverable by Excel.

Thus, an easy way of getting our C# class to define Excel worksheet functions is to simply mark the class with ClassInterfaceType.AutoDual.

However, there is a downside to doing this. All public functions on our class are then available as worksheet functions, including those inherited from System.Object. This isn't ideal. It would be better if we could have more control over what gets exposed to Excel and make sure that only the functions that are intended to be worksheet functions are actually discoverable by Excel. This is the reason I prefer not to use the ClassInterfaceType.AutoDual attribute.

(There is also a problem with COM versioning when using ClassInterfaceType.AutoDual - but we won't cover that here).

Instead, I prefer to define the worksheet functions on a dedicated interface and have my C# class implement that interface.

So, let's define our worksheet function interface. We'll call it IFunctions.

public interface IFunctions {

  object[,] UNIQUEVALUES(Excel.Range TargetRange);

}

Notice that the UNIQUEVALUES function returns a 2-d array of objects. This array will contain the unique values extracted from the specified Excel.Range object.

Next, we define our functions class and have it implement the IFunctions interface. We also use the default attribute to specify that we want the IFunctions interface to be the class's default COM interface.

[ComDefaultInterface(typeof(IFunctions))]

public class Functions : IFunctions {

  public object[,] UNIQUEVALUES(Excel.Range TargetRange) {

    object[,] values = TargetRange.get_Value(System.Reflection.Missing.Value) as object[,];

    List<object> unqVals = new List<object>();

    foreach (object obj in values) {

      if (!unqVals.Contains(obj))

        unqVals.Add(obj);

      }

    object[,] resVals = new object[unqVals.Count, 1];

    for (int idx = 0; idx < resVals.Length; ++idx)

      resVals[idx, 0] = unqVals[idx];

      return resVals;

    }

}

The algorithm for extracting the unique values is as follows (I confess that this may not be the optimal algorithm).

We extract the values in the Range object into a 2-d array of object. Because the array contains object, we can cope with both textual data and numerical data. Next, we iterate over the array and store each value that hasn't already occurred in a List<object>. We then allocate a new 2-d array of object and copy the unique values from the List<object> into the new array. Finally, we return the new array to Excel.

When we return an array from a UDF Excel handles serialising the array into the worksheet in the appropriate way (as long as the user array-entered the cell formula). This means that the dimensions of the array matter. The first dimension of the array is mapped to columns. So if we return an object[3] then Excel interprets this as "three columns containing one row each" and places the three values adjacent to each other (i.e., each one in a different column). If we return an object[1,3]then Excel interprets this as "one column containing three rows" and places each value beneath each other (i.e., each one in a different row).

So, it is very easy to return an array to Excel. All we have to do is set the correct array dimensions and let Excel handle copying the values from the to the appropriate cells in the worksheet.

We'll talk more about how arrays are used in Excel worksheet functions later in the article.

If we stop coding here, we'll find that Excel still can't use our class. There is one final thing we need to do: set the class's Programmable key in the registry.

The Programmable key indicates that a COM class exposes type information for its default interface. The key is specified under the COM class's HKEY_CLASSES_ROOT\CLSID\{Guid} key (where {Guid} is the guid of the COM class).

There is a convenient way to set this registry key from C#. There exist two attributes called ComRegisterFunctionAttribute and ComUnregisterFunctionAttribute. These attributes are used to mark which functions COM should call during the COM registration/unregistration process. So all we need to do is write a function that inserts the Programmable key during registration and removes it during unregistration.

[ComRegisterFunctionAttribute]

public static void RegisterFunction(Type type) {

  Registry.ClassesRoot.CreateSubKey(GetSubKeyName(type));

}

[ComUnregisterFunctionAttribute]

public static void UnregisterFunction(Type type) {

  Registry.ClassesRoot.DeleteSubKey(GetSubKeyName(type), false);

}

private static string GetSubKeyName(Type type) {

  string s = @"CLSID\{" + type.GUID.ToString().ToUpper() + @"}\Programmable";

  return s;

}

(Notice that the registration functions are static).

Now we've finished coding the Functions class and the UNIQUEVALUES function. We can now use from an Excel worksheet just like any other custom worksheet function.

Using UNIQUEVALUES in a Worksheet

The first to do is to install our Functions class as an automation addin. This enables Excel to call our custom UDFs.

To install as an automation addin, we bring up the usual Addin Manager via Excel Options->Add-Ins->Manage Excel Add-ins. From this dialogue we click the Automation button and select ExcelExtensions.Functions from the list.

To use the function, we highlight the cells which we want the unique values written into, type the =UNIQUEVALUES( ... ) where ... is the range from which we want to extract the values and then press ctrl-shift-enter.

Note that we must press ctrl-shift-enter to tell Excel that this function returns an array. Excel automatically serialises the values returned from the function into the array of highlighted cells. If the number of values returned from the function is less than the number of cells we highlighted, Excel places #N/A in each of the extra cells.

Consider the following example.

Notice that there are only four unique values in the range D3:D9. However, because we highlighted 7 cells (i.e., E3:E9) Excel has placed #N/A in each of the 3 additional cells.

If we highlight too few cells (say, 2 cells) then Excel will place the first to values returned from the function in those cells.

This is normal array function behaviour in Excel. Notice that we didn't have to write anything in our code to do this; Excel does this for us automatically.

If the data in D3:D9 changes, Excel fires UNIQUEVALUES and our function recalculates the new unique values.

In the following screenshot, we change the first value from Red to Yellow and the unique values automatically update:

The UNIQUEVALUES function can also be used with Excel's native functions that work with arrays. Two examples of such functions are COUNT and INDEX.

Both COUNT and INDEX can accept arrays as parameters. So we can use these functions on the array returned by UNIQUEVALUES.

The formula =COUNT(UNIQUEVALUES(D3:D9)) returns the number of values in the array. The formula =INDEX(UNIQUEVALUES(D3:D9), 1, 1) returns the first value in the array; and =INDEX(UNIQUEVALUES(D3:D9), 2, 1) returns the second; and so on ... (remember that UNIQUEVALUES returns an array one column wide).

Summary

  • The functions that we want to expose as custom worksheet functions need to be defined on our class's default interface;
    • Defining our custom worksheet functions on a dedicated interface and using ComDefaultInterface to make that interface the default is one way of doing this;
  • Our COM class must define the Programmable registry key under its CLSID key;
  • We can return multiple values from a custom UDF by returning them in an array;
  • Excel handles the serialisation of the array into the worksheet;

Download

I have added the UNIQUEVALUES function into my ExcelExtensions add-in. This can be downloaded from the MSDN Code Gallery by clicking here.

There are two downloads which can be downloaded from the Releases tab: ExcelExtensionsSrc.zip and ExcelExtensions.msi.

ExcelExtensionsSrc.zip contains the complete source code and ExcelExtensions.msi is the set up program.

Over time, I'll be adding additional feature to the addin.