How I Learned to Stop Worrying and Love the SharePoint Topology Service

Well, let's get the hard part out of the way first and admit that my earlier post explaining and detailing how to load balance the Topology Service on the publishing farm in a cross-farm federation scenario was just plain wrong. If it makes me feel better, other wise men were fooled as well; but at the end of the day, after Servé Hermans got the ball rolling (thank you Servé) it became clear that the process by which SharePoint 2010 load balances service applications was different than I originally understood, and this has been confirmed by the developer within Microsoft who actually owns the service.

So first things first: what this means for most SharePoint admins and architects is that you don't need to worry about load-balancing the Topology Service even in federated farm scenarios. SharePoint will handle that for you.

For many of you, that is all you will ever need to know, and you can ignore the rest of this discussion. For those of you who'd like to know what's really happening under the hood, I invite you to read further; buckle your seatbelts as this will be quite a ride!

SPLoadBalancer PowerShell Module

First, let me introduce you to a PowerShell module which I created and will utilize through the rest of this discussion. Download it here or via the link at the end of this post. I've named this module SPLoadBalancer; to load it into your PS session, on a SharePoint server drop the SPLoadBalancer folder into a directory called C:\SPModules and run the following comands in a PowerShell session:

$SPModulePath = "C:\SPModules" # <-- replace with your own directory
$env:PSModulePath += ";$SPModulePath"
Import-Module SPLoadBalancer

The main elements you get from this module are a new PowerShell function Get-SPServiceLoadBalancer and some extended members for the SPServiceLoadBalancer object returned by this function which surface its non-public members. The Get-SPServiceLoadBalancer function takes a SPServiceApplicationProxy object (to be exact, a SPServiceApplicationProxyPipeBind object), and looks for and returns the SPServiceLoadBalancer that has been created and associated with that proxy if one is available. I'll be explaining and utilizing this function and the objects and information it returns as we move along here.

How the SPServiceLoadBalancer Works

The first step in grasping the conceptual working of SharePoint's load-balancing system is to understand that it happens primarily on the consuming, or client, side. Unlike a classic load balancer, which sits on the publishing, or service, side of the communication and is invisible to the client, SharePoint implements the load balancer as a part of the client side of the communication, inside the service application proxies.

I'm going to use the BDC service application and service application proxy as an example of a federated service here, although all service applications can have associated load balancers, and at least five of them do. One of the reasons I like using the BDC service is it's easy to start and stop service instances for it.

The first BDC service application proxy in a farm can be retrieved in PowerShell using Get-SPServiceApplicationProxy and filtering the results, as follows:

$bdcProxy = Get-SPServiceApplicationProxy | Where-Object -FilterScript { $_.TypeName -eq "Business Data Connectivity Service Application Proxy" } | Select-Object -First 1

Frequently throughout, I'll use commands like the following:

Get-SPServiceApplicationProxy $bdcProxy.Id

The reason I don't just use $bdcProxy is to make sure we get a fresh copy of the object each time generated via its ID.

SPRoundRobinServiceLoadBalancer Construction

As part of the constructor for the BDC proxy, and for any Service Application Proxy utilizing SharePoint's built-in load balancer system, a load balancer for the individual proxy is constructed and stored in a persisted field within the proxy object. This line is copied from within the .ctor method of BdcServiceApplicationProxy in .NET Reflector:

this.loadBalancer = new SPRoundRobinServiceLoadBalancer(serviceApplicationAddress);

Note that the name of the persisted field where the load balancer object is stored varies from proxy to proxy, although the paradigm is the same. As a result, my Get-SPServiceLoadBalancer function looks for a field with FieldType=SPServiceLoadBalancer or FieldType.UnderlyingSystemType=SPServiceLoadBalancer, assuming there will be only one, instead of looking for a hard-coded field name. Relatedly, if a proxy doesn't have an associated load balancer, the function writes a verbose message to this effect and returns null.

In the constructor above, serviceApplicationAddress is the full URI also shown on the Publish dialog from the Manage Service Applications page in Central Administration. It ends up stored as an internal property Uri on the newly-created load balancer, at Microsoft.SharePoint.SPServiceLoadBalancer.Uri, and is utilized in the lifecycle of the load balancer. The custom Get-SPServiceLoadBalancer function in my module returns a load balancer object with this property revealed. To take a look at this, try the following command, building on the lines above:

PS C:\Users\josh> Get-SPServiceApplicationProxy $bdcProxy.Id | Get-SPServiceLoadBalancer

UpgradedPersistedProperties : {}
ApplicationAddresses        : Microsoft.SharePoint.SPLoadBalancedUriEndpoint
ExpireFailedAddressInterval : 00:10:00
EndpointAddresses           : https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https
Uri                         : urn:schemas-microsoft-                              com:sharepoint:service:bfa8307c159e4d6bb33b3023
                              03a0d5de#authority=urn:uuid:09b79b7b275e46a3b72
                              940ae302e6da4&authority=https://server05:32844/
                              topology/topology.svc

Actually, when the LoadBalancer is initially constructed, the EndpointAddresses and ApplicationAddresses properties would still be null, and the Uri property would be the only one populated.

Proxy Provisioning

The next step in the lifecycle of the BDC proxy and other proxies is provisioning. The .Provision() method of the proxy is typically called immediately after it is constructed; within this method, the Provision() method of SPServiceLoadBalancer is called. Once everything has been provisioned, the .Update() method of the proxy is called, which leads to many of these elements being persisted back to the configuration database in association with the proxy object.

SPServiceLoadBalancer Provisioning

In SPServiceLoadBalancer.Provision(), the Uri property of the load balancer is parsed into its component parts and stored in the internal SPLoadBalancedUri object.

The Toplogy Service Application Proxy

For every farm which local service application proxies will connect to, including the local farm, the local farm needs an associated Topology Service Application Proxy (SPTopologyWebServiceApplicationProxy). This service application proxy is responsible for maintaining and updating lists of active endpoints for each local proxy's connected (i.e. associated) service application.

Within the provisioning of the load balancer (in our example, for the BDC service application proxy), the local farm is checked for a Topology Service Application Proxy associated with the farm that this (BDC) service application proxy intends to connect to. If the Topology proxy doesn't yet exist, it is created. In a single standalone (i.e. unfederated) farm, there will be a single Topology Service Application Proxy associated with the local farm's Topology Service Application. If the farm is attempting to connect to a remote publishing farm, an additional Topology Service Application Proxy will be located, or created if necessary, for each remote farm. The name for the created proxy is the same as the ID of the Topology Service Application to which the proxy will connect (and the GUID in the authority=urn:uuid:<GUID> element of the Uri). For emphasis, the proxy will have a different ID from the Service Application; but the Name property of the proxy will be the same as the ID property of the Service Application.

To retrieve a list of Topology Service Application Proxies created in the local farm, run the following PowerShell command:

(Get-SPTopologyServiceApplicationProxy).ApplicationProxies | Format-List *

Unfortunately, Get-SPToplogyServiceApplicationProxy actually returns an SPTopologyWebServiceProxy (look again, Application is missing in the second one), which is why we call the .ApplicationProxies property on the returned object to get the SPTopologyWebServiceApplicationProxy objects. Note the Name property of the returned ApplicationProxy objects, and that it is not the same as the ID. Compare the Name property here with the Id property returned by:

Get-SPTopologyServiceApplication | Format-List *

As you can see, the ID of the connected Topology Service Application is used as the Name for the Topology Service Application Proxy. As a result, to only get the Topology Service Application Proxy which is associated with the local farm, you could run this command:

(Get-SPTopologyServiceApplicationProxy).ApplicationProxies | ? { $_.Name -eq ((Get-SPTopologyServiceApplication).Id) }

There can only be one Topology Service Application per farm, so this works.

At this point in the load balancer provisioning process, the needed Topology Service Application Proxy has been found or created, and returned for further use.

SPConnectedServiceApplication

The next step in the load balancer provisioning process for the (BDC) service application proxy is to create a new SPConnectedServiceApplication for this (BDC) proxy and associated with the approriate Topology proxy. SPConnectedServiceApplication and SPConnectedServiceApplicationCollection are a class and collection class, respectively, entirely hidden from the public object model; SPConnectedServiceApplication is where the current list of available endpoints for a given service application is cached by the Topology proxy. Each proxy load balancer creates its own SPConnectedServiceApplication as a child of its associated Topology proxy and uses it as an authoritative source of available endpoints for the service application it intends to connect to.

As described, SPConnectedServiceApplication is a child persisted object of a Topology Service Application Proxy in the Hierarchichal Object Store (i.e. the Objects table in the Configuration Database). One way to see a list of SPConnectedServiceApplication objects and view their persisted state is to run the following query against the store (here, SPS2010_ConfigDB is the name of my farm's configuration database):

SELECT [Id]
      ,[ClassId]
      ,[ParentId]
      ,[Name]
      ,[Status]
      ,[Version]
      ,CAST([Properties] AS
XML) AS
Properties
  FROM [SPS2010_ConfigDB].[dbo].[Objects] WITH (NOLOCK)
 WHERE [ClassId] LIKE
'FE65EF27-73F5-47C3-B23E-3D4FE5E10079'

If you wanted to determine the SPConnectedServiceApplication objects associated with a given Topology Service Application Proxy (i.e. their parent), you could determine the ID of the proxy (e.g. via (Get-SPTopologyServiceApplicationProxy).ApplicationProxies), and then run this query, replacing <ID from proxy> appropriately:

SELECT [Id]
      ,[ClassId]
      ,[ParentId]
      ,[Name]
      ,[Status]
      ,[Version]
      ,CAST([Properties] AS
XML) AS
Properties
  FROM [SPS2010_ConfigDB].[dbo].[Objects] WITH (NOLOCK)
 WHERE [ParentId] LIKE
'<ID from proxy>'

With both of these queries, the Properties column will be a clickable XML document. Click it to open the serialized, persisted form of the object. For SPConnectedServiceApplication objects there will be a m_ApplicationAddresses field containing a list of endpoint Uris.

Once the new SPConnectedServiceApplication is created, the Topology Service Application Proxy is immediately called to get an initial list of endpoint addresses for the new object, to then be utilized by the new proxy (the BDC proxy in our examples). Following this initial population, the Topology Proxy's AddressesRefresh job (Type=Microsoft.SharePoint.SPConnectedServiceApplicationAddressesRefreshJob, Name=job-spconnectedserviceapplication-addressesrefresh) will run every 15 minutes and refresh the addresses in the SPConnectedServiceApplication object via the Topology Service Application on the farm where the connected service application resides.

If you'd like to watch what happens when a service instance is removed on the publishing farm, run the SQL query above to see the current state of the SPConnectedServiceApplication and its list of endpoints. Then, stop a service instance for the service application and run the AddressesRefresh job on the consuming farm (Start-SPTimerJob job-spconnectedserviceapplication-addressesrefresh). After waiting about 20 seconds for the job to complete, run the SQL query again; you will see that the SPConnectedServiceApplication's version has been updated and that the stopped service instance has been removed from its list of endpoints. Voíla! the Topology Service at work.

SPConnectedServiceApplication.ApplicationAddresses

SPConnectedServiceApplication.ApplicationAddresses returns the current list of endpoints for the associated service application (via the persisted m_ApplicationAddresses field). One interesting caveat which you must know to understand some quirky behavior described later on is that this property (ApplicationAddresses) behaves differently when the connected service application is in the same farm as the Topology Service Application Proxy (i.e. when they're both in the local farm). In that case, ApplicationAddresses does not check the persisted list of endpoints; instead it gets the endpoints directly from the live local service application. Only when the service application is remote will ApplicationAddresses rely entirely on the persisted list of endpoints.

In any case, the AddressesRefresh job will refresh the list in the database once it runs; at that point the endpoints returned by ApplicationAddresses will once again be in sync with the values in the database even for local service applications.

We have now reached the end of the provisioning stage for the SPServiceLoadBalancer! But we're not done yet. Let's explore how the load balancer actually utilizes the list of addresses to load balance activities.

SPServiceLoadBalancer.EndpointAddresses

EndpointAddresses is the connection point between SPServiceLoadBalancer and its associated SPConnectedServiceApplication. The EndpointAddresses property of SPServiceLoadBalancer iterates through the .ApplicationAddresses property of the SPConnectedServiceApplication, which (typically) returns the cached list of service application endpoints as retrieved by the last AddressesRefresh job. As mentioned above, sometimes the list returned by SPConnectedServiceApplication.ApplicationAddresses temporarily doesn't match the list in the database. In this case, SPServiceLoadBalancer.EndpointAddresses will reflect the ApplicationAddresses property, and not the list in the database.

Returning to my custom PowerShell module and function described and imported above, run this command again:

PS C:\> Get-SPServiceApplicationProxy $bdcProxy.Id | Get-SPServiceLoadBalancer

UpgradedPersistedProperties : {}
ApplicationAddresses        : Microsoft.SharePoint.SPLoadBalancedUriEndpoint
ExpireFailedAddressInterval : 00:10:00
EndpointAddresses           : https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https
Uri                         : urn:schemas-microsoft-                              com:sharepoint:service:bfa8307c159e4d6bb33b3023
                              03a0d5de#authority=urn:uuid:09b79b7b275e46a3b72
                              940ae302e6da4&authority=https://server05:32844/
                              topology/topology.svc

The EndpointAddresses property is the SPServiceLoadBalancer.EndpointAddresses property and maps directly to the associated SPConnectedServiceApplication.ApplicationAddresses property (caveat: do not confuse this with the ApplicationAddresses property output here as well, which is in fact SPRoundRobinServiceLoadBalancer.ApplicationAddresses, to be discussed next).

Let's have some fun now and show what happens to this property as we turn service instances on and off in our local farm.

PS C:\Users\josh> Get-SPServiceInstance -Server SERVER05 | ? { $_.TypeName -eq "Business Data Connectivity Service" }

TypeName                         Status   Id
--------                         ------   --
Business Data Connectivity Se... Disabled 8bc476da-69e3-4097-b950-cf310aa67983

PS C:\Users\josh> Get-SPServiceInstance -Server SERVER05 | ? { $_.TypeName -eq "Business Data Connectivity Service" } | Start-SPServiceInstance

PS C:\Users\josh> Get-SPServiceApplicationProxy $bdcProxy.Id | Get-SPServiceLoadBalancer

UpgradedPersistedProperties : {}
ApplicationAddresses        : Microsoft.SharePoint.SPLoadBalancedUriEndpoint
ExpireFailedAddressInterval : 00:10:00
EndpointAddresses           : {https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https, https://server05:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https}
Uri                         : urn:schemas-microsoft-                              com:sharepoint:service:bfa8307c159e4d6bb33b3023
                              03a0d5de#authority=urn:uuid:09b79b7b275e46a3b72
                              940ae302e6da4&authority=https://server05:32844/
                              topology/topology.svc

Notice that as soon as the service instance came online, its endpoint is returned by EndpointAddresses (and by SPConnectedServiceApplication.ApplicationAddresses).

Now, before the AddressesRefresh job gets a chance to run (every 15 minutes by default), run this query in SQL and click open the XML for the object:

SELECT [Id]
      ,[ClassId]
      ,[ParentId]
      ,[Name]
      ,[Status]
      ,[Version]
      ,CAST([Properties] AS
XML) AS
Properties
  FROM [SPS2010_ConfigDB].[dbo].[Objects] WITH (NOLOCK)
 WHERE [ClassId] LIKE
'FE65EF27-73F5-47C3-B23E-3D4FE5E10079'
   AND Properties LIKE '%BdcService.svc%'

Note that even though EndpointAddresses already shows the newly provisioned ServiceInstance, the database doesn't yet reflect the change! It only shows the original ServiceInstance. Note the Version column for the SPConnectedServiceApplication objects returned by this query - it's going to change after we run the next command.

Now run this:

Start-SPTimerJob job-spconnectedserviceapplication-addressesrefresh

This will cause the Topology Service Application Proxy to update the persisted list of endpoints in the database. 15-20 seconds after starting the job, rerun the SQL query above. You will notice the version number has been raised, and if you open the XML, you will find the list of endpoints has been updated. All is now in sync.

I'll leave it as an exercise for you to try at home to run the PowerShell commands and SQL queries in a federated farm scenario, turning on and off ServiceInstances in the publishing farm. In this case SPServiceLoadBalancer.EndpointAddresses will not immediately reflect changes on the publishing farm side; remember, ApplicationAddresses relies only on the persisted list of endpoints unless the connected service application is in the local farm. The persisted list will not be updated until the AddressesRefresh job has run.

SPRoundRobinServiceLoadBalancer

Almost there, but not quite out of the woods yet… The load balancer doesn't use SPServiceLoadBalancer.EndpointAddresses directly when returning available endpoints for the proxy to connect to. Thinking about this logically, EndpointAddresses is a list of available endpoints, but that's not all we need in a load balancing system. We also must establish how we will actually balance load and deal with failures across the endpoints list. This is where SPRoundRobinServiceLoadBalancer steps in.

SPRoundRobinServiceLoadBalancer does depend on the EndpointAddresses property it inherits from its SPServiceLoadBalancer parent class. It will utilize the EndpointAddresses list to initialize its list of available endpoints for load-balancing. Before the SPRoundRobinServiceLoadBalancer can do its work, it must be initialized by calling its BeginOperation() method. If you look into the SharePointLoadBalancer.types.ps1xml file included in the SPLoadBalancer module, you'll see that before returning the SPRoundRobinServiceLoadBalancer.ApplicationAddresses property, I make sure that BeginOperation() has been called at least once, since if it hasn't, ApplicationAddresses will be null.

A key element of the BeginOperation() method is to populate the local ApplicationAddresses property with a collection of endpoints based on its inherited SPServiceLoadBalancer.EndpointAddresses property. The created ApplicationAddresses collection is different than EndpointAddresses in a couple ways: first, it associates a Status property (Failed or Succeeded) and related m_FailureExpirationDateTimeInTicks field with each endpoint. These additional members are used to track whether the endpoint has recently failed. (I won't discuss the process of marking an endpoint as failed in this article, but perhaps will in the future.) Second, the field backing the ApplicationAddresses property isn't persisted to the database. As a result, the ApplicationAddresses collection is volatile and is recreated each time the process is restarted. As mentioned, this happens in the BeginOperation() method for the load balancer.

As you may have guessed, the ApplicationAddresses property returned by my custom PowerShell function corresponds to the SPRoundRobinServiceLoadBalancer.ApplicationAddresses property, and returns a collection of SPLoadBalancedUriEndpoint objects (as the original ApplicationAddresses property does as well). To better understand the lifecycle of these objects, let's take a look at what happens to this property as service instances are turned on and off in a farm.

First, take a look at what is contained in ApplicationAddresses when two service instances are on for the BDC service and the farm is in a stable, at-rest state:

PS C:\Users\josh> Get-SPServiceApplicationProxy $bdcProxy.Id | Get-SPServiceLoadBalancer | Select-Object -ExpandProperty ApplicationAddresses

   Status Uri                        m_FailureExpirationDateTimeInTicks
   ------ ---                        ----------------------------------
Succeeded https://server07:32844/bfa8...                              0
Succeeded https://server05:32844/bfa8...                              0

For now, we'll leave the Status and FailureExpirationDateTimeInTicks members and focus only on the list of Uris. Let's move on to a different command for further review of the state of the farm at rest:

PS C:\Users\josh> Get-SPServiceApplicationProxy $bdcProxy.Id | Get-SPServiceLoadBalancer |
    Select-Object EndpointAddresses, @{
        Name="ApplicationAddresses";
        Expression={ $_.ApplicationAddresses |
            ForEach-Object -Begin { $uris = @() } -Process { $uris += $_.Uri } -End { $uris } } } |
                Format-List

EndpointAddresses    : {https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https, https://server05:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https}
ApplicationAddresses : {https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https, https://server05:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https}

I've fixed up the output via a dynamic property in Select-Object to just show a list of endpoints for ApplicationAddresses (instead of the more complex full SPLoadBalancedUriEndpoint objects). Note that at this point both lists are in sync and show all active endpoints for BDC service instances.

Next, we'll turn off a service instance and see how this is initially reflected in these properties:

PS C:\Users\josh> Get-SPServiceInstance -Server SERVER05 | ? { $_.TypeName -eq "Business Data Connectivity Service" } | Stop-SPServiceInstance

PS C:\Users\josh> Get-SPServiceApplicationProxy $bdcProxy.Id | Get-SPServiceLoadBalancer |
    Select-Object EndpointAddresses, @{
        Name="ApplicationAddresses";
        Expression={ $_.ApplicationAddresses |
            ForEach-Object -Begin { $uris = @() } -Process { $uris += $_.Uri } -End { $uris } } } |
                Format-List

EndpointAddresses    : https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https
ApplicationAddresses : {https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https, https://server05:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https}

EndpointAddresses has been updated, but ApplicationAddresses still hasn't been. EndpointAddresses is updated because the connected service application is local to the proxy's farm, so the list of endpoints is pulled directly from the service application, not the persisted list in the database, which hasn't yet been updated. Remember that the persisted list will also be updated eventually when the AddressesRefresh job runs. Within 30 seconds of that job completing, ApplicationAddresses will also refresh with the updated list. We'll discuss what drives that process later on.

Now let's look at the same command run on a remote farm configured with a consuming service application proxy. This takes place at the same time as the command above was run - immediately after the SERVER05 service instance has been turned off in the publishing farm:

PS C:\Users\josh> Get-SPServiceApplicationProxy $bdcProxy.Id | Get-SPServiceLoadBalancer |
    Select-Object EndpointAddresses, @{
        Name="ApplicationAddresses";
        Expression={ $_.ApplicationAddresses |
            ForEach-Object -Begin { $uris = @() } -Process { $uris += $_.Uri } -End { $uris } } } |
                Format-List

EndpointAddresses    : {https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https, https://server05:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https}
ApplicationAddresses : {https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https, https://server05:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https}

Here in the consuming farm, EndpointAddresses hasn't been updated yet. Since the service application isn't local to the proxy's farm, the list of addresses is only pulled from the persisted list in the database. As a result, it won't be updated till the AddressesRefresh job runs. To demonstrate this, let's stay on the consuming farm and run the AddressesRefresh job:

PS C:\Users\josh> Start-SPTimerJob job-spconnectedserviceapplication-addressesrefresh

Wait about 20 seconds for the job to complete and run the earlier reporting command again:

PS C:\Users\josh> Get-SPServiceApplicationProxy $bdcProxy.Id | Get-SPServiceLoadBalancer |
    Select-Object EndpointAddresses, @{
        Name="ApplicationAddresses";
        Expression={ $_.ApplicationAddresses |
            ForEach-Object -Begin { $uris = @() } -Process { $uris += $_.Uri } -End { $uris } } } |
                Format-List

EndpointAddresses    : https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https
ApplicationAddresses : {https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https, https://server05:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https}

Now, EndpointAddresses has been updated (as has been the list of endpoints in the database). You'll notice though that ApplicationAddresses still hasn't been updated. Wait another 20-30 seconds and try again:

PS C:\Users\josh> Get-SPServiceApplicationProxy $bdcProxy.Id | Get-SPServiceLoadBalancer |
    Select-Object EndpointAddresses, @{
        Name="ApplicationAddresses";
        Expression={ $_.ApplicationAddresses |
            ForEach-Object -Begin { $uris = @() } -Process { $uris += $_.Uri } -End { $uris } } } |
                Format-List

EndpointAddresses    : https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https
ApplicationAddresses : https://server07:32844/bfa8307c159e4d6bb33b302303a0d5de/BdcService.svc/https

Now ApplicationAddresses has been updated as well. If we go back to the publishing farm and run the AddressesRefresh job (or just wait for the 15 minute interval to pass) and then wait another 30 seconds or so, we will see the ApplicationAddresses property there reflect the new list of endpoints as well. The next section will explain this behavior.

SPRoundRobinServiceLoadBalancer.ApplicationAddresses

I'll wrap up this post by discussing what causes the ApplicationAddresses property of the SPRoundRobinServiceLoadBalancer to refresh, as this is what finally makes the load balancer fully "merged" again. Remember that ApplicationAddresses is a volatile, non-persisted property which is created each time a load balancer is started; that is, each time the load balancer is deserialized from the database and BeginOperation() is called. Take a look in the OnDeserialization() method in SPServiceLoadBalancer and you'll find the piece to complete the puzzle: in here, the load balancer registers itself with the static SPLoadBalancerRefreshTimer class. In the class constructor (.cctor) for this class, it registers a Timer-based task which calls SPServiceLoadBalancer.LoadBalancerRefreshTimerCallback() for each registered load balancer instance. The Timer task executes and calls these instance callbacks every 30 seconds. Each individual callback checks if the version of the SPConnectedServiceApplication associated with the load balancer is greater than the load balancer's own stored copy of this version number. If it is, the virtual OnEndpointAddressesChanged() method is called, which is defined in SPRoundRobinServiceLoadBalancer. This method causes the load balancer's volatile cache to be updated from the EndpointAddresses property, which itself will have been updated from SPConnectedServiceApplication.ApplicationAddresses' new values. Since the Timer is configured to run every 30 seconds, it can take up to 30 seconds from the time the AddressesRefresh timer job completes and updates the SPConnectedServiceApplication before the ApplicationAddresses property gets updated.

Also, remember that the persisted SPConnectedServiceApplication object is only updated by the AddressesRefresh timer job, even though the EndpointAddresses property may already reflect changes in the proxy's local farm, where the persisted list is bypassed. This is why the ApplicationAddresses property can lag significantly behind the EndpointAddresses property; the EndpointAddresses property will immediately reflect changes to the local farm, but since the underlying SPConnectedServiceApplication won't be updated till the AddressesRefresh job executes, the LoadBalancerRefreshTimerCallback will still see that the current version of the SPConnectedServiceApplication matches the version from which it last refreshed, and therefore will not call OnEndpointAddressesChanged() to update the volatile ApplicationAddresses property. Only once the AddressesRefresh job runs and updates the SPConnectedServiceApplication object will the LoadBalancerRefreshTimerCallback realize that the list of endpoints has been updated and tell SPRoundRobinServiceLoadBalancer.ApplicationAddresses to update itself as well.

Conclusion

If you've made it this far then congratulations… you surely deserve some sort of prize! At this point we've covered all the salient points of how the SharePoint service load balancer works and all of its different components, jobs, and callbacks; it's been quite a journey. You may have noticed that I didn't explain how the Topology Service Application Proxy load balances itself for connection to a remote farm. This was, after all, the main concern which started this discussion to begin with! The fact is that the Topology Service Application Proxy constructs and provisions its load balancer in the same way as other proxies. I've left out an in-depth discussion of this process for now because it would definitely make the current discussion too cluttered. You can take a look at the implementation of SPTopologyWebServiceApplicationProxy.Provision() in .NET Reflector for the full details on the Topology proxy's own load balancer. Perhaps in the future I'll use the Topology proxy as a case study in describing how the load balancing system works, similar to how I used the BDC proxy in this post.

I also hope at some point to demonstrate how the proxies utilize the load balancer within their Execute*() methods to generate addresses to work with. For the BDC proxy, take a look at the Execute<T> method. Perhaps more useful for research purposes is Microsoft.Office.Server.Search.Administration.SearchServiceApplicationProxy.Execute() method and the associated DoWebServiceBackedOperation<T> method in the same class.

Finally, it remains to discuss what happens when an endpoint isn't responsive for whatever reason and is marked as failed.

Thanks again for your time and attention. Comments and feedback are more than welcome - after all, they're what helped drive this information to begin with!

SPLoadBalancer.zip