My experiment and observation on Service Fabric Communication Stacks


This post is provided by Senior App Dev Manager, Vishal Saroopchand who asks the question, “How do you decide what Communications Stack to use in your Service Fabric applications?”


How do you decide what Communication Stack (Remoting, WCF, Custom Implementation) to use in your Service Fabric applications? Do you know how each communication stack performs? This post is to help shed some light on the performance characteristics that I observed with my recent experiment.

The questions I was attempting to answer

Which communication stack should I choose for inter-service communication on a low latency workload? What was the performance footprint of the out-of-the-box (OOtB) communication stack? How does my custom implementation perform against the OOtB options?

Experiment setup

In order to answer this question, I decided to visualize the time it takes to move a message of variable size from a Web Proxy Gateway through a series of Stateful services and back to the Web API. Each Stateful service will listen on WCF, Remoting, a custom WebSocket and custom PubSub using Service Bus Topics. I will timestamp each visit and then plot it in a box-plot chart.

clip_image002

Results

Here is a snapshot of one test. Please keep in mind, call durations will fluctuate per test, but generally the performance follows the same pattern.

clip_image004

PubSub is not shown in the above diagram as the total duration was roughly 1.8 seconds. Here is a zoomed out view showing all 4 Communication Listeners.

clip_image006

The bottom line is this: If you want simplicity and can live with a sub ~30ms sending messages between nodes, use the built in communication stack (Remoting or WCF). If you want better performance, consider building your own ICommunicationListener and handle your own data serialization.

In Conclusion

Carefully plan your communication stack. Spend some time upfront to understand the characteristics of each, try improving it by taking ownership of serialization and/or the communication stack.

Consider building your own for low latency communication and use one of the built-in as a fallback. For custom communication stacks, remember, you must handle scenarios such as churn in your cluster where services move from one node to another. You should not assume an endpoint will remain stationary in your implementation. To test the soundless of your custom communication, use Chaos to simulate churn and see how your implementation perform.

Feel free to clone my experiment code here and include your own Communication stacks.


Premier Support for Developers provides strategic technology guidance, critical support coverage, and a range of essential services to help teams optimize development lifecycles and improve software quality.  Contact your Application Development Manager (ADM) or email us to learn more about what we can do for you.

Comments (7)

  1. MedAnd says:

    Any chance you could further comment on the perf of Azure ServiceBus vs Azure EventHub… how does EventHub compare? From the above it seems PubSub refers to Azure ServiceBus which has an avg latency of 1.8 sec vs .2 sec for SF Remoting in your test… so Azure ServiceBus has a 9 x latency as compared to SF Remoting?

    1. Vishal Saroopchand says:

      I added EventHub to the experiment. It is surprisingly good compared to Service Bus Topics for obvious reasons. I will update the chart to show a side by side comparison. But in general, if you can live without Service Bus Topic features such as Dead Lettering, Ordering, Dup Detection) you should consider EventHub.

  2. Would love to see “HTTP/JSON over Reverse Proxy” being added to this great comparison.

    1. MedAnd says:

      Just an FYI, I’ve added a request for Remoting V2 stack and gRPC comparison here: https://github.com/vsaroopchand/ServiceFabric-Performance-Test/issues/1

      1. Vishal Saroopchand says:

        Thank you. I will work on adding these to the experiment.

        1. MedAnd says:

          Look forward to reading the followup Vishal… this type of insight is exactly what the Service Fabric community needs!

          1. Vishal Saroopchand says:

            gRPC added, check the issue comments for links to the updated results. As noted in the comment, DotNetty seems to perform much better in the cloud when address is not localhost. I need to investigate this further.

Skip to main content