How fast is interop code?

How fast is interop code? If you’re in one kind of code and your calling another, what is the cost of the interop?

 

For example, .Net code can call native C++ code (like Windows APIs) and vice versa. Similarly with Foxpro and C++ code. .Net code is often referred to as Managed code because much is managed for the programmer, such as memory allocation. That leaves C++ code to be called “Unmanaged”. An easy way to interop with C++ code is to use COM (Component Object Model, or sometimes ActiveX) as glue. Whether it’s COM calling .Net or vice versa, the managed boundary is traversed twice: there and back. Similarly with Fox code calling COM code.

 

Fox code calling .Net code (e.g. A Visual Basic COM object is simple to create, call and debug from Excel) will have both Fox to COM and COM to .Net interop.

 

I want to measure raw interop performance, so I want to remove memory allocation and Unicode/String marshalling issues from the tests. I want to have a loop on one side call a very fast method on the other, so that most of the execution time is in the interop, not the loop or the method call. I want to use in-process, same thread calls, so remote procedure calls/marshalling are not being measured.

 

We’ll create a native C++ method that just returns consecutive integers. A simple loop in the .Net or Fox client that calls this method keeps a running total would be a good perf test.

 

Start with this sample ActiveX control code: Create an ActiveX control using ATL that you can use from Fox, Excel, VB6, VB.Net. You don’t need the events and methods from that sample, just the control itself.

(If you’re using VS2008, in the ATL Project wizard, select DLL and just choose “Finish”. When adding a method in Class View, make sure to choose the ITestCtrl Interface (defined in MyCtrl.IDL, not ITestCtrl VCCodeStruct defined in MyCtrl_i.h. Similarly, if you’re adding an event, make sure to choose the _ITestCtrlEvents interface under MyCtrlLib in Class View. Also, you need to run the “Implement ConnectionPoint Wizard” and change the call to “Fire_MyEvent”, see https://msdn.microsoft.com/en-us/library/9h7xedd1.aspx)

When COM code is called from VB.Net or FoxPro, the calls are not quite direct: COM is used for creating the object and initialization and there is some parameter/return value massaging required per call. Then it’s either a straight virtual function call (vTable) call to IDispatch (late bound) or IUnknown (early bound). IOW, the performance would be slower than the a direct PInvoke or DECLARE DLL call.

Let’s add a simple method RetInt with no parameters that just returns an int. Add a method to our COM Control by right clicking on the ITestCtrl interface in Class View and choosing Add->Method to start the “Add Method Wizard”

 

Since all COM interface method calls return HRESULTS, to return a value an additional parameter is added and marked with the RetVal attribute and is passed by ref. So make the Method Name “RetInt”, the Parameter type “LONG *”, and the Parameter Name “RetVal”. Choose the Retval checkbox. Then choose “Add” to add the param to the method.

 

Add another method DoSum similarly. This method will run with no interop, so we have a baseline for comparison. (It runs the loop multiple times because it goes so much faster, but the timing measurement divides out the multiple runs.)

 

The resulting code is added to TestCtrl.CPP. Add the implementation:

 

static LONG g_Int = 0;

STDMETHODIMP CTestCtrl::RetInt(LONG* RetVal)

{

      *RetVal = ++g_Int; // just return consecutive integers

      return S_OK;

}

// DoSum will calculate the value with no interop whatsoever

STDMETHODIMP CTestCtrl::DoSum(LONG nTimes,LONG nInternalLoopCount, DOUBLE* Retval)

{

      LONGLONG nSum;

      for (LONG j = 0 ; j < nInternalLoopCount ; j++) // this code runs so fast we have to do it multiple times

      {

            nSum = 0;

            for (LONG i = 1 ; i <= nTimes ; i++)

            {

                  nSum += i;

            }

      }

      *Retval = (DOUBLE)nSum;

      return S_OK;

}

// RetIntStatic can be called directly via PInvoke or Declare Dll

extern "C" HRESULT __declspec(dllexport) WINAPI RetIntStatic(LONG *RetVal)

{

      *RetVal = ++g_Int;

      return S_OK;

}

 

You can add more methods, like a way to reset g_Int to get more accurate results, but I don’t really care about the results, just how long it takes to get them.

 

Of course, you’ll want to run perf tests using optimized Release builds, so you’re not including debug asserts, etc. A really smart optimizing compiler would remove the loops in DoSum altogether!

 

If you have Foxpro, try running this Fox code. Notice that DoLoop can take either the Form or the Control as a parameter. There’s a RetInt method on each.

 

 

CLEAR ALL

CLEAR

MODIFY COMMAND PROGRAM() NOWAIT

_screen.FontName="Courier New" && Make font monospace, not proportional

SET DECIMALS TO 6

g_Int=0

ox=CREATEOBJECT("MyForm")

*ox.visible=1

nLoops=1000000

nInternalLoopCnt=1000

ns=SECONDS()

zObj=ox.oc && use temp var so we don't deref ox.oc in loop

r = zObj.DoSum(nLoops,nInternalLoopCnt)

?"Internal DoSum ",r,(SECONDS()-ns)/nInternalLoopCnt

?DoLoop(ox,nLoops, "With No Interop" )

?DoLoop(ox.oc,nLoops,"With COM Interop")

*Use early binding:

oy=CREATEOBJECTEx("MyCtrl.TestCtrl","","")

?DoLoop(oy,nLoops,"With COM Interop Early Bound")

*Try Declare DLL: like PInvoke

DECLARE integer _RetIntStatic@4 IN "d:\dev\vc\myctrl\release\myctrl.dll" as RetIntStatic integer @ Retval

?DoLoopStatic(nLoops,"With DeclareDLL interop")

      FUNCTION DoLoop(zObj as object,nTimes as Integer, sDesc as String) as String

            LOCAL nSum

            nSum=0

            ns=SECONDS()

            FOR i = 1 TO nTimes

                  nSum = nSum + zObj.RetInt()

            ENDFOR

            RETURN sDesc+" Sum= "+TRANSFORM(nSum) + " "+ TRANSFORM(SECONDS()-ns)

      RETURN

      FUNCTION DoLoopStatic(nTimes as Integer, sDesc as String) as String

            LOCAL nSum

            nSum=0

            nRetval=0

            ns=SECONDS()

            FOR i = 1 TO nTimes

                  RetIntStatic(@nRetval)

                  nSum = nSum + nRetval

            ENDFOR

            RETURN sDesc+" Sum= "+TRANSFORM(nSum) + " "+ TRANSFORM(SECONDS()-ns)

      RETURN

DEFINE CLASS MyForm as Form

      ADD OBJECT OC as olecontrol WITH ;

            oleClass="MyCtrl.TestCtrl",;

            height=200,width=300

      left=200

      AllowOutput=.f.

      PROCEDURE RetInt as Integer

            g_Int = g_Int+1

            RETURN g_Int

ENDDEFINE

 

The DoSum call (Fox and VB) was consistent as expected: they both execute in about the same time because there is only one interop call.

 

I consistently saw the COM Interop loop taking about 50% longer than the non interop loop. This makes sense. The code that calls the COM object has to deal with all sorts of parameter types, marshalling, etc. The non interop did the entire calculation within Fox code.

 

The DoSum method has its own internal loop to do the calculation, which does NO interop of any kind in the loop, runs roughly 2000 times faster. That implies there are about 2000 times more instructions executed in the loop.

 

Now I want to run a similar test using VB.Net. Let’s add a new project to the ActiveX control project from above.

 

Choose the Solution Explorer, right click on the solution, choose Add New Project, VB->Windows Forms Application. I put my VB Project within the folder of the TestCtrl project.

 

Right click on the project, and choose “Set As Startup Project” so hitting F5 will start this project.

If you’re on a 64 bit OS, then make sure you target x86 (Project->Properties->Compile->Advanced Compile Options->Target CPU->x86

 

 

Add the ActiveX control to your toolbox: Right click on the toolbox, choose items\COM Components…TestCtrl class.

Now drag the control from the toolbox onto the form. Dbl Click on the form and paste in this code:

 

Public Class Form1

    'Note the path: "..\..\..\Release\MyCtrl.dll"

    <Runtime.InteropServices.DllImport( _

            "..\..\..\Release\MyCtrl.dll", _

           CallingConvention:=Runtime.InteropServices.CallingConvention.Winapi, _

           entrypoint:="_RetIntStatic@4")> _

    Friend Shared Function RetIntStatic(ByRef RetVal As Integer) As Integer

    End Function

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

        Dim nLoops = 1000000

        Dim nInternalLoopCnt = 1000

        Dim sStopWatch = Stopwatch.StartNew

        Dim r = Me.AxTestCtrl1.DoSum(nLoops, nInternalLoopCnt)

        Console.WriteLine("Internal DoSum Native=" + r.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000 / nInternalLoopCnt).ToString)

        sStopWatch = Stopwatch.StartNew

        r = Me.DoSum(nLoops, nInternalLoopCnt)

        Console.WriteLine("Internal DoSum .Net =" + r.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000 / nInternalLoopCnt).ToString)

        Console.WriteLine(DoLoop(Me, nLoops, "With Late bound No interop, calling local VB.Net method"))

        Console.WriteLine(DoLoop(Me.AxTestCtrl1, nLoops, "With Late bound COM interop"))

        Console.WriteLine(DoLoopEarlyForm(Me, nLoops, "With Early bound No interop, calling local VB.Net method"))

        Console.WriteLine(DoLoopEarlyCtrl(Me.AxTestCtrl1, nLoops, "With Early bound COM interop"))

        Console.WriteLine(DoLoopPInvoke(nLoops, "With PInvoke interop"))

        End

    End Sub

    Function DoLoop(ByVal zObj As Object, ByVal nTimes As Integer, ByVal sDesc As String) As String

        Dim nSum = 0L

        Dim sStopWatch = Stopwatch.StartNew

        For i = 1 To nTimes

            nSum += zObj.RetInt

        Next

        Return sDesc + " Sum = " + nSum.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000).ToString()

    End Function

    Function DoLoopEarlyForm(ByVal zObj As Form1, ByVal nTimes As Integer, ByVal sDesc As String) As String

        Dim nSum = 0L

        Dim sStopWatch = Stopwatch.StartNew

        For i = 1 To nTimes

            nSum += zObj.RetInt

        Next

        Return sDesc + " Sum = " + nSum.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000).ToString()

    End Function

    Function DoLoopEarlyCtrl(ByVal zObj As AxMyCtrlLib.AxTestCtrl, ByVal nTimes As Integer, ByVal sDesc As String) As String

        Dim nSum = 0L 'L for Long so doesn't overflow 32 bits

        Dim sStopWatch = Stopwatch.StartNew

        For i = 1 To nTimes

            nSum += zObj.RetInt

        Next

        Return sDesc + " Sum = " + nSum.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000).ToString()

    End Function

    Function DoLoopPInvoke(ByVal nTimes As Integer, ByVal sDesc As String) As String

        Dim nSum = 0L 'L for Long so doesn't overflow 32 bits

        Dim sStopWatch = Stopwatch.StartNew

        For i = 1 To nTimes

            Dim RetVal = 0

            RetIntStatic(RetVal)

            nSum += RetVal

        Next

        Return sDesc + " Sum = " + nSum.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000).ToString()

    End Function

    Private Shared g_Int As Long

    Public Function RetInt() As Long

        g_Int += 1

        Return g_Int

    End Function

    Function DoSum(ByVal nTimes As Long, ByVal nInternalLoopCount As Long) As Double

        Dim nSum As Long

        For j = 1 To nInternalLoopCount ' calculated multiple times because it's fast

            nSum = 0

            For i = 1 To nTimes

                nSum += i

            Next

        Next

        Return nSum

    End Function

End Class

Run the code with the Output Window visible. Here, the VB code with interop ran maybe 40% slower, and several times slower than the Fox code. I realized that this was because of the late binding calls the VB code does. The VB DoLoop method takes zObj as an Object, and I invoke the RetInt method on it. That means, the VB runtime latebinder code is called to reflect on the object and see if it has a Retint method on it that can be called. Both the Form and the control have a method with this name. The latebinding was code that I didn’t want to measure, so I added some strongly typed calls that forced the calls to be early bound direct calls, which were much faster. For the Non-interop code doing the entire calculation within VB, the late bound was around 1000 times slower than the early bound, due to the late binder code. For the Interop case, the late bound was about 30 times slower than the early.

 

(Comparing .Net speed with native, the DoSum call (with no interop at all) in .Net was almost 3 times slower than Native, but that’s expected too: native code runs faster than managed.)

 

These early bound calls are several times faster than the Fox code too: they don’t have to do any parameter packing/checking.

 

However, even the Fox code doing early binding, Fox still has to do a lot of parameter translation between fox types and COM types.

 

The Fox and VB calls via PInvoke/Declare DLL were the fastest of all. They have to do the least parameter translation/packing/checking. This makes sense: the method call is declared to have N parameters of certain types, so less work needs to be done.

 

 

 

Using ILDasm to see the IL for the RetInt, you can see that there isn’t much code. The Fox code for RetInt, however, causes much more code to run.

 

 

.method public instance int64 RetInt() cil managed

{

  // Code size 24 (0x18)

  .maxstack 2

  .locals init ([0] int64 RetInt)

  IL_0000: nop

  IL_0001: ldsfld int64 WindowsApplication1.Form1::g_Int

  IL_0006: ldc.i4.1

  IL_0007: conv.i8

  IL_0008: add.ovf

  IL_0009: stsfld int64 WindowsApplication1.Form1::g_Int

  IL_000e: ldsfld int64 WindowsApplication1.Form1::g_Int

  IL_0013: stloc.0

  IL_0014: br.s IL_0016

  IL_0016: ldloc.0

  IL_0017: ret

} // end of method Form1::RetInt

Or use the debugger to see the native code in DoSum: (cdq is ConvertDoubleToQuadWord)

       LONGLONG nSum=0;

       for (LONG i = 1 ; i <= nTimes ; i++)

       {

692B1DF1 8B C1 mov eax,ecx

692B1DF3 99 cdq

692B1DF4 03 D8 add ebx,eax

692B1DF6 13 EA adc ebp,edx

692B1DF8 8D 41 01 lea eax,[ecx+1]

692B1DFB 99 cdq

692B1DFC 03 F0 add esi,eax

692B1DFE 8B 44 24 20 mov eax,dword ptr [esp+20h]

692B1E02 13 FA adc edi,edx

692B1E04 83 C1 02 add ecx,2

692B1E07 48 dec eax

692B1E08 3B C8 cmp ecx,eax

692B1E0A 7E E5 jle CTestCtrl::DoSum+21h (692B1DF1h)

692B1E0C 3B 4C 24 20 cmp ecx,dword ptr [esp+20h]

692B1E10 7F 0B jg CTestCtrl::DoSum+4Dh (692B1E1Dh)

       {

              nSum += i;

692B1E12 8B C1 mov eax,ecx

692B1E14 99 cdq

692B1E15 89 44 24 10 mov dword ptr [esp+10h],eax

692B1E19 89 54 24 14 mov dword ptr [esp+14h],edx

       }

 

 

This (optimized) code sums 32 bit values to a 64 bit running sum, so you can see instructions like “ADC”, which is AddWithCarry.

 

As an exercise, on 64 bit, create code like DoSum that natively handles 64 bit ints (or modify this code to use just 32 bits). You’ll see that the loop is trivial.

Hint: make sure you have the 64 bit tools installed.