Add-In Performance: What can you expect as you cross an isolation boundary and how to make it better [Jesse Kaplan]


Aren’t AppDomains too Slow?


“Aren’t AppDomains too slow?” This is one of most common questions we get when we recommend AppDomain or Process isolation to someone building an extensible application. For some applications the answer is yes, which is why we make it easy to activate without isolation as well, but for many more applications you’ll find the performance hit you take by moving to AppDomain or Process isolation is well within the acceptable range for your requirements. In this article we’re going to talk about the level of performance you can expect in different scenarios and what you can do to your application, object model, and pipeline to speed things up where necessary.  This article is all about optimizing communication once everything is up and running, but the next one is going to discuss the performances of getting everything up and running using different levels of isolation.


About the Numbers


All of the numbers I’m providing here are only designed to give you a rough idea of what to expect. They were recorded in an unscientific manner (read: on my main box with lots of other applications running) and I typically saw a range of += 10% on different test runs. The application was compiled in VS in retail mode and was run without a debugger. Nothing was NGEN’d and only the contract assembly was installed into the GAC.


Baseline for Performance Expectations


Before we get into details on the guidance we should establish the baseline of what type of performance you can expect with different types of calls across an AppDomain boundary without any interesting optimizations. For simple calls that pass and return primitive types you should see between 2000 and 7000 calls per second across an AppDomain boundary. For calls that involve passing or returning a type derived from IContract across the boundary you should see about 1000 calls per second. These numbers decrease by about 30% when you switch from an AppDomain to a Process boundary.


This means that even without any interesting optimizations you should be pretty comfortable with crossing the boundary many times in response to any sort of user input and shouldn’t have to worry about the user noticing slow-down caused by the isolation boundary.


General Guidance on Performance across Isolation Boundaries


There are two basic strategies you should utilize in reducing the impact the isolation boundary has on your application:


1.       Maximize the speed of crossing your particular boundary


2.       Reduce the number of times you cross the boundary


Maximize the Speed of Crossing the Boundary


The key insight you need to understand this bullet is that when using System.AddIn to cross isolation boundaries there are three different remoting paths that you might be taking across those boundaries:


1.       Cross Process


2.       Cross-Domain


3.       Cross-Domain FastPath


If you are going across process boundaries then you don’t have much choice on your path and your main goal, when you need to increase performance at the boundary, is to minimize the number of times you need to cross it. On the other hand if you are only using AppDomain isolation then you have more options. There is a good chance that this is the first time you have heard of the “Cross-Domain FastPath”: the FastPath isn’t exposed as an API or really documented extensively.  The FastPath boils down to a highly optimized short-cut across AppDomains which by-passes as much of the remoting stack and bookkeeping as possible. The runtime uses this short-cut whenever it can quickly prove that a given call does not require any of those remoting facilities.


There are a few things you need to do to enable the FastPath for your application, but once you enable it the runtime will choose between it and the standard cross-domain path on a per transition basis depending on the type of call you are making. Therefore your goal, if you need to increase your performance crossing an AppDomain boundary, is to first enable the FastPath for your app and then to try and stay on it for as many of the performance-critical transitions as make sense.


To enable the fast path you need to make sure that the runtime shares your Contract assembly between your host and add-in AppDomains. To do this you need to mark the main method of your exe with [LoaderOptimization(LoaderOptimization.MultiDomainHost)] and to install your Contract assembly in the GAC. The “MultiDomainHost” loader optimization tells the runtime to share all assemblies that are installed in the GAC between AppDomains.  If you have a native executable and are using the native hosting APIs to start up the runtime you can still enable “MultiDomainHost”: see these docs for more info on exactly how to do it.


Once the fast path is enabled you need to consider which calls you make will actually use it. Basically you will only stray from the fast path on calls that involve either passing or returning an IContract: IContracts take you off the FastPath because they rely on MarshalByRefObject which actually does require a hefty chunk of the remoting bookkeeping. Nearly all calls involving only serializable types will stay on the fast path.  The main restriction for keeping serializable objects on the FastPath is that they are marked serializable using the [Serializable] attribute. If you instead implement ISerializable you will end up using the normal path.  Serializable structs will also stay on the fast path and thus are a good option for passing large amounts data across the boundary when by-value semantics are appropriate.


Reducing the Number of Times you Cross the Boundary


 It is important to realize that the cost of crossing an AppDomain boundary does not grow substantially as the amount of data you pass across increases. For example, assuming no other optimizations, you can pass a byte[100000] in about twice the time it takes to pass across a byte[10]; in this case an increase in data of 10,000 times only results in a 50% slowdown. If you want to look at a MB per second measurement passing across bytes in byte[10] chunks leaves you with a rate of about 0.029 MB/second and passing them in byte[10000000] chunks leaves you with a rate of about 406 MB/second.  If you take advantage of the fast-path those numbers change to 0.75 MB/sec with byte[10] and about 560 MB/sec with byte[10000000].


Now, there are some cases where passing data in byte[] form is very natural (image processing comes to mind) but in many cases you’ll want something that is much easier to consume on the other side. In these cases passing data across in the form of structs can be very useful. Serializable structs can use the fast path across AppDomains. This, combined with the fact that they can contain large amounts of data and can even be passed across efficiently in arrays, means they are very useful in passing large amounts of strongly typed data (as opposed to byte arrays) across boundaries with a minimal impact to performance. The biggest downside to structs when compared to byte arrays is that if you are unable to enable the fast path then the performance of passing large numbers of structs degrades significantly: it is still much faster than using standard IContract calls but it may not be appropriate for passing around very large amounts of data. In our sample you’ll see that we achieve 2.1 MB/S for a struct[10] and 134 MB/S for a struct[10000000] if the fast path is enabled. Falling off the fast path leaves us at 0.085 MB/S for struct[10], 0.297 MB/S for struct[100000], and a DNF (did not finish before I gave up on it) for struct[10000000].


Strings were the last method I used for passing data across the boundary and led to some of the more interesting results: with very large strings my test bed was reporting “infinity bytes per second.” Increasing the size of the string did not increase the time it took to pass it across the AppDomain boundary regardless of whether I enabled the fast path or not. Think for a second about why that may be. As it turns out, strings are special cased by the runtime and shared across all AppDomains in the process. Thus to pass one across the boundary all the runtime needs to do is perform a pointer copy, whose speed doesn’t depend on the size of the string being pointed to. This means that if you have a system where you need to pass large amounts of text around then you’ll typically not have to worry about the size of that text when considering the performance of your AppDomain isolation boundaries.


The Data


The data below was gathered using the sample. To run the tests for yourself you’ll just need to download the sample, build it, install the contract assembly into the gac (gacutil –if [contract path]), and run the app outside of a debugger. There are comments in Program.cs as to how to modify it to test the different scenarios below. In addition to the tests below there are additional methods in the sample that you can experiment wit.


You can find the sample on our codeplex site here.










































Calls/Second


Operation


Fast Path


Cross Domain


Cross Process


void DoStuff(void)


166,667


5,555


3,984


void DoStuff(int)


200,000


2,857


2,857


int GetInt()


166,667


6,666


4,016


int GetInt(int)


166,667


4,166


2,617


void DoStuff(struct)


71,428


2,777


2,358


void DoStuff(IContract)


3,205


1,098


657


 



































































MB/Second


Data Type


Size


Fast Path


Cross Domain


Cross Process


 


10


2.171


0.085


0.061


struct[]


100,000


85.048


0.297


0.265


 (32-bit struct)


10,000,000


134.189


DNF


DNF


 


10


0.751


0.029


0.013


byte[]


100,000


298.642


180.844


88.778


 


10,000,000


563.509


406.132


75.468


 


10


1.221


0.046


0.025


string


100,000


24.414


462.825


36.479


 


10,000,000


2,441,406.25


33,216.51


46.588


 


Bringing it all Together


We’ve just gone into a lot of detail and discussed a lot of different options for increasing performance and now that that’s out of the way I think we can distil it into a simple cheat sheet:


DO: enable sharing of assemblies using the LoaderOptimization attribute.
This is the first step to enabling the fast path. Even if you do not install your contract assembly into the GAC (and thus fully enabling the fast path) you will see an decrease in the activation time of your add-ins beacause this allows the Framework assemblies to be shared between domains.


DO: install the Contract assembly in the GAC when using AppDomain isolation.
This step along with the LoaderOptimization attribute will enable the fast path for your application and will provide dramatic performance increases for many of your calls across the AppDomain boundary. If you are exclusively using Process isolation this is not important.
Note: this step means that your installation program will require admin privileges. 


Consider:  passing data across boundaries using by-value semantics (structs, byte[],strings, ect).
 
If you are passing large amounts of data between your host and add-in you should consider passing it across in one chunk using a by-value data-type rather than in pieces using a deep IContract based hierarchy. This is approach is very natural in cases where an add-in needs to process and possibly modify the data:  just pass it out to the add-in and have the add-in return the modified set. It can be more awkward if the data is dynamic and can change continuously but using a system of events to notify the data consumers of changes can help mitigate this.


DO NOT: dismiss an isolation level without first giving it a try in the scenarios you care about.
Many times you’ll find that in a real application the performance hit you take by moving to an isolation boundary isn’t as much as you expect and perfectly acceptable for your situation. In the cases where it is not, there are a few simple things you can do to make the experience better.


One of the things I hope you get from this article is that for many extensible applications an AppDomain boundary, or even process, boundary is just fine without any performance tweaking and that for those applications that need (or want) better performance there is a range of options for providing dramatic performance increases depending on your needs.


 


 

Comments (7)

  1. Jesse,

    Very cool! Thank you for this information.

    I think you’re summary is very important. In many cases none of this matters because the AppDomain boundary is blazingly fast with System.AddIn. I had figured this out because overall my process is near the speed of writing the stuff I need to disk (a few other gems in 2008 are also blazingly fast.

    But I am seeing longer startup times for the first call acros the AppDomain boundary. This appears to be more than discovery. I’m seeing something on the order of half a second, which totally dwarfs teh actual speed of the call. Can you comment on first call performance? I know there are some specific issues around WPF performance, but I’m using a WinForms client that does have the LoaderOptimization attribute, but I’m not crazy about requiring the GAC although I may try it.

    I’ll later be calling the AddIn from a Workflow local service. Do you know if there are any perf implications of that scenario. I assume I just start my workflow host with the LoaderOptimization attribute and everything else works the same, but I think this is an extremely important scenario as it allows delivered workflows to have specific behaviors modified in the field with appdomain boundary security protection.

  2. Kevin Kerr says:

    Excellent post!

    Extremely useful information.

  3. Jesse,

    I keep coming back to this post.

    It has astounding implications for the design of add-ins. Where is the best place to have this conversation.

    Immediately, there’s the issue of strings. I have large strings – around 2Mb. I’m currently doing a setup step to cache these on the addin side. However, this is causing some very real architcture problems due to this. Is a 2Mb string really free?

    I’m also working with architectures interesting from two directions. I’m using addins to plug into workflow because workflow does not have a discovery model.

    I am also doing additional assembly loading/selection on the AddIn side WITHOUT further use of addin (I’m passing an assembly and class name). I am doing this to avoid restrictions in the AddIn location, and because I am worried about communcations between addins and performance implicaiton. But this sets me up for versioing problems later. This interface between the two addins will be exceedingly chatty as its currently designed. I’m still hashing through these issues and interested in additional perf and architectural gudance as you have things to add.

    Kathleen

  4. Garry Trinder says:

    The cost of passing a string across an AppDomain boundary really is completely independant on the size of the string, so yes they are free.

    For general discussions about System.AddIn please start up new discussions on our codeplex site (www.codeplex.com/clraddins). It’s much better suited for having multiple discussions at once and has the benifit of much better spam filtering so I won’t have to approve each message before it’s posted ;-).

  5. Gil says:

    i’m trying to pass a CollectionViewSource between the addin and the host and i’m having some serialization issues, i tried several things and i think my only option is just to pass an array of the generic class of the collection with events that i supply manually. if you know some better way i would be very happy to hear about it..

    thanks Gil

  6. Swiss says:

    What about the static objects that are in the separate appdomains? Doesn’t tagging the assembly with the MultiDomainHost attribute change the behavior of this?

Skip to main content