This is a post I’ve been waiting to write for quite a while – but it had to wait until R3 became available.
|The SysExtension framework offers some great capabilities, but unfortunately it also comes with a performance penalty (in R2). A while ago I set out to understand where the time was spent, and hoping to optimize it. As an engineer this is a typical task with an expected outcome: A few days spent, an optimization of 10-20 percent is found. This time the task turned out to completely consume me for the better part of a week, I learned some important lessons, and the outcome exceeded my wildest imagination. This blog shares my findings.|
To be able to measure the impact of any changes I build a very simple test harness to exercise the SysExtension framework. A two level deep class hierarchy and one attribute that decorated the sub-class. I then compiled everything to IL, and wrote a small job to measure how many class instances I could spin up per second. This velocity measurement was around 3,400 classes/second.
Debugging through the code I quickly learned that a lot of things were going on. This included creating a key for the attribute for various caches. This key was constructed via reflection on the attribute class. I avoided using reflection by introducing a new interface (SysExtensionIAttribute) and I fixed a number of other minor issues. Now the velocity jumped to 40,000 classes/second.
First physical limit
Is this an acceptable velocity? Well, how fast can it possibly be? The logic is in essence just creating a class via reflection, so I did a measurement of DictClass.MakeObject(). This could give me 84,000 classes/second. Slightly about double of the my current implementation. After some investigation I discovered two expensive calls: “new DictClass()” and “dictClass.makeObject()”. Can you spot what they have in common? They both requires a call into the native AOS libraries. In other words an interop call. I tried various other calls into the AOS, such as “TTSBegin” (only the first is hitting the DB), and “CustParameters::Find()” (again, only the first one is hitting the DB). To my surprise the velocity of these calls where comparative to DictClass.MakeObject(). The interop overhead outweighs what the method is actually doing. In other words, there is a limit to how many native AOS methods you can call per second. Let us call this velocity: Speed-of-sound.
Ultimate physical limit
Being a bit intrigued I measured the fastest and rawest possible implementation: “new MyClass()”. This would run strictly in IL, no overhead of any kind, the result was a whooping 23,800,000 classes/second. Let us call this velocity: Speed-of-light. In the words of Barney Stinton: “Challenge accepted!”
To achieve this kind of velocity the code must run 100% as IL. No calls into native AOS code. Period. Naturally there are APIs in .NET allowing for dynamically creation of class instances – they are slower than a direct instantiation, but still much faster than calling native AOS code. One other challenge was that SysExtension also can execute as pCode, and then a call into IL would cause a similar slow interop – just in the opposite direction. After a few iterations I had an implementation that would not cause an interop calls, regardless of if the code runs as IL or pCode. Take a look at SysExtensionAppClassFactory.getClassFromSysExtAttribute in R3 for details. I was pleased with the velocity: 661,000 classes/second. Or about 200 times faster than R2. Or about 15 times faster than a call to CustParameters::Find().
Problem solved: The SysExtension framework no longer has performance issues.
For a long time we have been hunting for SQL and RPC calls when looking for performance. RPC calls are expensive, as communication between two components (Client and Server) occurs. Just like SQL calls are expensive as the Server communicates with SQL (and waits for the reply). We still need to hunt for unnecessary RPC and SQL calls! Nothing changed; except that a third culprit has been identified: Native AOS calls. Relatively speaking the cost of calls into native AOS code is insignificant when compared to RPC or SQL calls – in the absence of these, the impact is measurable and significant.
This is an ERP system, so there will always be SQL calls. So why be concerned with the performance of X++ code? Well, if you can minimize the time between SQL calls, then you will also limit the time SQL holds locks, and you will experience better overall performance and scalability. After all, do you want you code to run with the speed-of-sound or speed-of-light?
Here is the test harness I used: PrivateProject_SysExpProject.xpo