SharePoint 2013 + Distributed Cache (AppFabric) Troubleshooting

Two messages you may have seen if you’ve administered SharePoint 2013 in anyway way about caching are “This Distributed Cache host may cause cache reliability problems” and/or “cacheHostInfo is null” from PowerShell. This article is about how to fix those errors & caching reliability problems in general for SharePoint 2013.

Update: see a simplified version of this article here if you’re not sure how AppFabric works with SharePoint.

Cache reliability warnings are fairly common to see in SharePoint 2013 installations of any complexity. It’s to do with how SharePoint interacts with the distributed cache cluster that’s used for all sorts of caching needs in 2013 from caching user tokens (with a fall-back option if it fails), to security trimming search results (also with fall-back on failure), to the social news-feed (with no fall-back – social just doesn’t work without a healthy cache cluster), all powered by AppFabric. For the most part a cache failure just means less than optimal performance but not always.

Therefore if you see this message in SharePoint you should pay attention to it. Here’s an example health error:


This message can come up for several reasons but in short, one or more servers that SharePoint thinks should be hosting the cache cluster, isn’t, for one reason or another. This guide will hopefully show how to fix this rather broad issue, but it depends on what the problem is first so to start you need to pick a scenario that describes your own…

Scenario 1 –SharePoint and AppFabric Don’t Agree Which Servers are in the Cluster

As already mentioned, SharePoint uses AppFabric for caching under the hood, which is an entirely standalone product in its’ own right. This means that AppFabric has its own ideas about what machines should make up the cluster in parallel to SharePoint. Normally this list of servers perfectly coincides so nobody notices AppFabric is even a thing until there’s a problem but any mismatch in server-info between the two products can often cause some pretty ugly problems and is often the root cause of the infamous “cacheHostInfo is null” error. The two server-lists need to be identical (and healthy) so let’s check both…

Query AppFabric for Caching Servers/Statuses

To find out, get the list of servers AppFabric thinks there should be run “Get-CacheHost” (use “Use-CacheCluster” if necessary). This command gives us a bit more than just the servers but also each servers’ serviceability status as far as AppFabric’s concerned.

Query SharePoint for Caching Servers/Statuses

To do the same for SharePoint, run:

Get-SPServiceInstance | ? {($_.service.tostring()) -eq “SPDistributedCacheService Name=AppFabricCachingService”} | select Server, Status

This will give you the same kind of data but from SharePoint’s POV instead. Make sure all statuses say “Online” but more importantly that both SP & AF have the same names between them. As mnentioned before, if you’re seeing “cacheHostInfo is null” somewhere then it’s quite likely there’s a mismatch here.

Oh No! AppFabric and SharePoint Server Lists Don’t Match!

Maybe AF thinks there are more servers caching than SharePoint does; maybe the server names don’t coincide. Here’s an example of a server-name mismatch:


Even if the names matched by the way, this particular example would also fail because the service-instance is disabled but for now let’s just focus on the name mismatch, which will indeed cause all sorts of cache reliability problems too.

It’s probably going to be AppFabric that’s got a server that SharePoint doesn’t think is caching anything, possibly because said server isn’t in the farm anymore or at least the name of the server isn’t (renaming a server with Rename-SPServer at the time of writing won’t update the name in AppFabric too, causing this type of mismatch. A small “feature” if you will).

In any case, AppFabric and SharePoint need a coinciding list and SharePoint needs the service-instance to be “online” (not “disabled”).

How to Remove Zombie AppFabric Service Instances from SharePoint Topology

If as is more common you need to also remove AppFabric instances from SharePoint, say because the service-instance is disabled, you can do it with this command:

$instanceName =”SPDistributedCacheService Name=AppFabricCachingService”
$serviceInstance = Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName -and ($ -eq $env:computername}


This PowerShell snippet (tries to) un-provision the service on the server (which might fail) then removes the service-instance from the SharePoint configuration database. If you look at the query, we pick out the service-instance that matches this machine-name so there’s no danger of it doing anything wrong as long as it’s run on the right machine PowerShell console.

You can do this from any machine for any other machine if you change the last where clause that passes in “this computer name”. For my example above I’ll change the computer-name to “sfb-sp15-wfe1” as that’s the server that has the bad service-endpoint.

How to Remove Ghost Servers from AppFabric

We need to remove any server that just doesn’t exist in the farm in any way. However if there’s a server in AppFabric that is in the farm but just shouldn’t be hosting AF do not use this method; try running “Remove-SPDistributedCacheServiceInstance” on the farm server in question first.

If on the other hand, manually ripping out the host from AppFabric cluster is the last resort, this is how. From a machine that is still working in the cluster (if possible), run Unregister-CacheHost passing in the name of the server to remove + the SharePoint provider + “connection-string” as so:

Unregister-CacheHost -HostName [machine] -ProviderType SPDistributedCacheClusterProvider -ConnectionString \\[machine]

Replace [machine] with the NetBIOS name of the machine you want to evict. In my example it would be:

Unregister-CacheHost -HostName sfb-sp15-wfe1.sfb-testnet.local -ProviderType SPDistributedCacheClusterProvider -ConnectionString \\sfb-sp15-wfe1.sfb-testnet.local

Once all phantom hosts have been eliminated from AppFabric, all being well we should have a healthy-if-slimmed-down cluster we can re-add other nodes to in the normal way with Add-SPDistributedCacheServiceInstance – which adds to AppFabric and SharePoint both, as the good SPLord intended. Before doing so, verify one more time that both SharePoint and AppFabric have the same server-list and that AppFabric says the server is “up” and SharePoint says the service-instance is “online”.

One More Time: Verify Service End-Points and AppFabric cluster Agree on Servers

All servers need to be in the AppFabric cluster and host an AppFabric service-instance in the farm, and be online:


Having cleaned out the rogue entries, I’ve gone back and added the other servers too with Add-SPDistributedCacheServiceInstance which sorts out both the SP and AF configuration at once.

Until you achieve this exact parity do not continue. The AppFabric hosts don’t necessarily need to be “up” at this time but the names have to coincide and SharePoint needs to have the service-instances online.

At this point your caching woes may even be over! In Central Administration get SharePoint to recheck any health-warnings about distributed cache.

Scenario 2 – No Server Mismatch but One or More AppFabric Service Instances are Disabled

At this stage we’ve verified the server lists between SP and AF match-up. Run this PowerShell command to find out if we have zombie endpoints in SharePoint:

Get-SPServiceInstance | ? {($_.service.tostring()) -eq “SPDistributedCacheService Name=AppFabricCachingService”} | select Server, Status

If any status say “disabled” then you have a problem. You need to:

If for some reason Add-SPDistributedCacheServiceInstance doesn’t give you a healthy endpoint, try running Remove-SPDistributedCacheServiceInstance then Add-SPDistributedCacheServiceInstance on the server in question. If you still can’t get a healthy endpoint after all that you’ll probably need to contact Premier support.

Scenario 3 – AppFabric & SharePoint Agree on Cache Servers but Some Servers are Down

In this scenario both products are on the same page about who should be caching but one or more nodes just aren’t for some reason or other.

Problem: Servers use Dynamic or Shared Memory

AppFabric is particularly sensitive to dynamic/shared memory. It can work on it but Microsoft doesn’t support it and if you wanted our help with an AppFabric cluster we wouldn’t do much unless each server had a fixed amount of memory, always.

Now the disclaimers’ done; I’ve had it working just fine with testing VMs on a dynamic VM using around 16gb; I tend to find that if memory usage expands suddenly and the host OS can’t provide the guest OS memory quick enough AppFabric will just give up and you’ll have to re-provision it all over again. The moral of the story here is, don’t be cheap on memory and expect AppFabric to work. Really, don’t, especially for anything that’s not your dev-box.

Problem: AppFabric Server Configuration State is Corrupt

First of all let’s see if the failing node even knows about the cluster. I’ve had a couple of occasions where the configuration has just died for various reasons and has just had to be reset. Run a check by getting the local cluster status with “Get-CacheHost” (use “Use-CacheCluster” if necessary).


This would suggest the cluster configuration on this failing node has died for reasons we don’t know, nor particularly care about assuming it’s not a regular occurrence. Cache clusters are trivial to setup so let’s just jump to the solution…

If you see the “cacheHostInfo is null” message during any of those, remove the service instances from SharePoint and the host from the AppFabric cluster as shown above, then repeat the remove/add commands.

Problem: AppFabric Service not Started

You’ll get reliability problems if the service isn’t started.


This is bad. This however is good:


If the service won’t start for some reason then I’d try removing & re-adding the server with Remove-SPDistributedCacheServiceInstance and Add-SPDistributedCacheServiceInstance.

Problem: Firewall Interference

Firewalls are a consideration for AppFabric. You should be able to see lots of chatter on port 22234 which is the internal cluster-chatter port. You should also see some activity on 222233 which is how SharePoint talks to the cluster; just make sure you don’t see any TCP resend packets being sent consistently.


Each cluster node needs these ports open between themselves and network tracing skills come in pretty handy here if you’re not sure if the ports are open.

More information about ports needed @

Edit: my good colleague Filip Bosmans has written a nice script which checks the general health of the cache-cluster all round, so can make some of these checks more automatic. If you’re having AppFabric issues, try his script out here.

Wrapping Up

Getting this right isn’t as easy as you might think. For the most part the caching and AppFabric just taking care of itself but there’s clearly a need to get your hands dirty now & then. Many people don’t even realize SharePoint just drives AppFabric and how that setup works; fixing these issues is mainly about understanding how to troubleshoot these two products as one.

Let me know if there’s any scenarios I haven’t covered in the comments – this is something I’d like to add to over time if needed. Thanks to my colleagues Filip Bosmans, Vlad Mihat, and others for helping out with this.


// Sam Betts

Comments (47)

  1. Nice summary! Thanks Sam!

  2. jvijayw says:

    Awesome work Sam!

  3. Sam, wondering if you also investigated these errors which consistently show up on all of my WFEs. DCache is running on 2 separate servers and no errors there. MaxBufferSize is updated to 33554432 from default. Other settings for Logon Token Cache are MaxOutputDelay: 2, ReceiveTimeout : 60000, ChannelOpenTimeOut : 3000 RequestTimeout : 3000

    ULS Errors:

    Unexpected – DistributedCache – ALL WFEs

    Unexpected error occurred in method 'GetObject' , usage 'Distributed Logon Token Cache' – Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0016>:SubStatus<ES0001>:The connection was terminated, possibly due to server or network problems or serialized Object size is greater than MaxBufferSize on server. Result of the request is unknown. —> System.TimeoutException: The socket was aborted because an asynchronous receive from the socket did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. —> System.IO.IOException: The read operation failed, see inner exception. —> System.TimeoutException: The socket was aborted because an asynchronous receive from the socket did not comp…

    Monitorable – DistributedCache – All WFEs

    Token Cache: Failed to get token from distributed cache for '0).w|s-1-5-21-1111111111111111111'.(This is expected during the process warm up or if data cache Initialization is getting done by some other thread).

  4. Hmm that looks like a timeout of some kind – try increasing the client-side cache settings to increase this value on the WFE in question with Set-SPDistributedCacheClientSettings (…/jj219593.aspx) – the ReceiveTimeout is the one you're after if memory serves.

  5. Jesse says:

    That is a very helpful article.  I am seeing something in my farm that shows the status of the server is 'Provisioning' and the services on server shows it is stuck on starting.  Which approach would work best to resolve this?

  6. Try a Remove/Add-SPDistributedCacheServiceInstance on the affected server and see if that makes a difference. That solves a multitude of sins, most of the time.

  7. Rick Allford says:

    This is the best article on Distributed Cache and AppFabric I have seen. You fixed my farm for me. Thanks!

  8. rdm says:

    Hey Samuel, great article! You combined all I've been reading in other places today. But unfortunately nothing has helped to resolve my issue yet or I might be doing something wrong. I tried to add Distributed Cache back to one of the servers, but it got stuck. Its status is "Starting" and I get the same errors in the screenshot you have for "Problem: AppFabric Server Configuration State is Corrupt". I tried to restart "AppFabric Caching Service", but it didn't help. Any suggestions on how to fix this? Thank you!

  9. Try removing all instances of the AF cluster on all nodes to try and clean-out the config store, then adding the nodes again. Make sure SharePoint doesn't think there's any SPServiceInstances with that role. Failing that, the config store in the SharePoint Config DB is probably corrupt (which would be very strange) so you may need a new Config DB (farm rebuild, basically).

  10. thomson says:

    Hi Samuel,

    i do have a question on Scenario 1 –SharePoint and AppFabric Don’t Agree Which Servers are in the Cluster

    When i run the Query from Sharepoint POV i get

    Server Name : DEV status online

    When i run from App Fabric POV

    HostName                                 Service Status

    DEV.domain.local :22233         Running

    Both shows online and running

    One is showing the Short computer name and the other showing the FQDN

    Is this a mismatch?


  11. Not necessarily; is SharePoint reporting a problem in CA? Can you see SharePoint ULS giving cache errors?

  12. Matthias Kusters says:

    Thanks for this excellent write-up! It helped me a great deal in wrapping my head around -and solving some issues with- the distributed cache.

  13. James DeLisa says:

    I tried to remove the host from How to Remove Ghost Servers from AppFabric..  This threw a error on the screen..  

    Unregister-CacheHost : ErrorCode<UnspecifiedErrorCode>:SubStatus<ES0001>:No such host is known

    At line:1 char:1

    + Unregister-CacheHost -HostName V-VWC2PSRCAPP2.ihs.internal.corp -ProviderType SP …

    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

       + CategoryInfo          : NotSpecified: (:) [Unregister-AFCacheHost], DataCacheException

       + FullyQualifiedErrorId : UnspecifiedErrorCode,Microsoft.ApplicationServer.Caching.Configuration.Commands.Unregis


  14. John says:

    Excellent, a life saver, thanks

  15. Rich says:

    Hi Sam,

    Thanks for putting this together. It takes a lot of worry off my mind as I've read this is a finicky service but your post shows there's resolutions to the issues when they come up.

    I'm looking at having a dedicated Distributed Cache Service that runs Windows 2012 R2 Standard and SP 2013 Foundation. My farm is Enterprise license though, can I get away with running foundation on the DCS boxes? I'm also wondering about what to do when I need to patch those servers though. In the event that I need to reboot after applying a patch should I stop the service before rebooting or before applying the patch? If I use the -Graceful parameter it should preserve the cached contents. I read in a forum that you also need to run the Remove-SPDistributedCacheServiceInstance command when shutting down. I don't see why that would be needed though. Can you provide some insight into the patching process of the dedicated cache host and the cache hosts that run on farm elements (wfe or app boxes)? I don't think I'm the only one with concerns about this but it wouldn't be the first time if I were 😉


  16. Hi Rich,

    Just in case I've misunderstood, we don't support mixing SharePoint editions in the same farm; a farm should be all enterprise/standard/foundation but not a mixture.

    For when a AppFabric machine is going to be rebooted, this article should be followed to avoid objects in the cache being lost on reboot –…/jj219613.aspx

  17. Eric says:

    Thanks for the article Sam.

    I just noticed that on our dev 2013 SP1 farm the distributed cache process seems to have a memory leak or some other issue as RAM usage just keeps going up – it's now at over 4.3 GB. This is on a dev farm that has only a couple folks hitting it to perform testing – nothing. I initially had dist. cache running on 3 servers with a total of 40 GB RAM. I've since removed one server (dedicated running search) from the cache cluster, bringing the total RAM for the remaining two servers in the cluster to 28 GB.


    1. First of all, if we are not planning on using any of the OOTB feeds/microblogging in SP2013 do we even need to run the dist cache? We are using Yammer for social.

    2. If we don't care about the cache but need to leave it running because it's required, it seems that we wouldn't also care about following the whole graceful shutdown process and keeping the cache when doing our monthly patching and reboot cycle.

    Appreciate any info/guidance.

    Thank you!

  18. Indul Hassan says:

    Hi Samuel

    Thanks for contributing this blog. I resolved my query with the help of this.

  19. Indul Hassan says:


    And one more thing, at Scenario 3 which are servers may get down. I basically didn't understand that part.

  20. Hey Eric,

    1. Yep, I'd still run it – AppFabric caches a bunch of stuff; not just social. Some things require AppFabric to work properly – login tokens for example (if you don't want users to continually have to log-in, seemingly at random).

    2. Correct; graceful shutdowns won't be so necessary if you're not running social but see answer #1 ^.

    Indul – What's not clear? Basically, all AF servers need to be able to communicate to each other (services started & network comms not blocked).

  21. Jeff says:

    Great article on what Distributed Cache is; however, we are not really sure why this it is on. We have a single SharePoint 2013 server that was implemented by a vendor. From what we have read this function is not on by default and is turned on. I am guessing this was not configured properly or the issue described her occurs over a period of time. Should we apply the fix described here or should we consider shutting off distributed cache? Thanks for any advice!

  22. Distributed cache should be on, and is on by default for SPServers. If it's not working, I'd highly recommend activating/fixing it 🙂

    Edit: it's mainly used for log-in token + view-state caching, some other internal stuff, and the social capabilities.

  23. Jeff says:

    Thank you Samuel! We will reread the article and look to make the necessary changes..

  24. Ripon Kundu says:

    Thank you, helpful blog.

  25. Ashish says:

    I am getting this error in logs

    Unexpected Exception in SPDistributedCachePointerWrapper::InitializeDataCacheFactory for usage 'DistributedLogonTokenCache' – Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0009>:SubStatus<ES0001>:Cache referred to does not exist.

    and I have checked the instances using Get-CacheHost and Get-SPServiceInstance. Both matches.

    I am still getting the error in logs and Newsfeeds – "We're still collecting the latest news. You may see more if you try again a little later"

  26. Hi Ashish,

    Have you tried granting access to the UPA account? Look for "Grant-CacheAllowedClientAccount" in my other post –…/troubleshooting-appfabric-reliability-issues-for-sharepoint.aspx

    // Sam

  27. Ankur says:

    Thanks great article. It worked like charm.

  28. Halwagy says:

    I have a question, I started the appfabric over two servers , but one of the servers is able to see the status of the nodes using  "get-cachehost" , and the other node is not able to see, with error " cannot read the connection String, please add them manually"

    the instances online over both servers

  29. Have you tried Use-CacheCluster 1st?

  30. BlueSky2010 says:

    Good one – thanks for putting this together Samuel!

  31. BlueSky2010 says:

    Hi Samuel – I was able to sort out most of my issues with Distributed Cache Host with your guide except one. Hoping you can shed some light. Health Analyzer failing on a non-existent server. "Unregister-CacheHost" fails with "No such host is known". Any suggestion how can I remove this. Tried doing multiple "ReAnalyze".

    Host names match through Get-ServiceInstance and Get-CacheHost commands. That non-existent server does not show up in these lists.Not sure where Health Analyzer picking that name from. That server used to exist at one point but decommissioned long time ago. 🙁

    Appreciate any feedback!

  32. Hey BlueSky. If you export the cache configuration to an XML file, do the host names match up? Another possibility is that the rule result is out of date – delete the error and see if it comes back.

  33. BlueSky2010 says:

    Looks like I just needed to wait long enough for the HA to pick this up after my cleanup. Came back Monday and not seeing the problem one anymore. Thanks for your feedback Samuel!

  34. Cleopatrakent says:

    Hi Samuel,

    I am trying to Add-SPDistributedCacheServiceInstance but the console says do not load the aseembly Microsft.ApplicationServer.Caching.Configuration version=, I have installed AppFabric 1.1 and the version of this assembly is 1.0.4632.0

    How i can solucionate this?


  35. CleopatraKent says:

    I have my AppFabric Service with status UP, the name of service match with my SPService, but this is disabled

    I do not Remove the service-instance because i get an error the assembly of my dll is version (windows fabric)

    and y my server 2012 i have installed the version 1.0.4632.0.

    Can i resolve this error? I need help!! I make all types of things but I do not change the version of assembly

  36. CleopatraKent says:

    Hi Samuel,

    Now I have AppFabric Service UP and my sharepoint caching ONLINE but my Distributed Cache in sharepoint do not work well .

    I try to Stop and Start the service "Distributed Cache" in Sharepoint but it tell me "cacheHostInfo is null"

    Can i resolve this error? I need help!! Please!!

    1000 Thanks Samuel

  37. Ben McInerney says:

    Hey Sam,

    What a fantastic, clear and concise article! A somewhat rare event. It made getting my second WFE back to playing nice a walk in the park.

    Especially when I just read the TechNet article that states:

    The Distributed Cache service can end up in a non-functioning or unrecoverable state if you do not follow the procedures that are listed in this article. ***** In extreme scenarios, you might have to rebuild the server farm. *****…/jj219613.aspx

    Thanks mate!


  38. Hey Ben, thanks for the comments – glad it helped!

  39. Gerald S. says:

    Any ideas on how to remove a "ghost" cache host that doesn't appear in in the cache cluster (get-cachehost)), nor is it part of the Farm.

    Only indication the Farm even "knows" about this server is from the health warnings that I see generated daily.

    The Unregister-CacheHost tells me that that "No such host is known".

  40. Simon says:

    Hi Samuel,

    thanks for this great article. I have referred to it many times already and it really helped to get things fixed.

    I wonder though, whether you might extend this article to an problem I have faced already many times:

    I observe the event id 6398 in application event log stating:

    "The execution method of job definition Microsoft.Office.Server.UserProfiles.LMTRepopulationJob (ID…) threw an exception. More information is included below.

    Unexpected exception in FeedCacheService.IsRepopulationNeeded: Cache cluster is down, restart the cache cluster and Retry."

    Of course, my first action was to compare the Get-CacheHost results with the Get-SPServiceInstance results but there is no deviation. Both show the same servers and the same state of each server.

    Last time I faced this issue in a LAB I simply unprovisioned the cache service and then re-provisioned it. But now I am facing it in a production environment. I wonder whether there is a better way to fix it without needing to re-provision the cache service.

    Thanks in advance!


  41. John says:


    i have one dedicated server in my farm for Distribution cache service, we are not using mysites or newsfeed so far. We need to shut down all SharePoint servers (Due to some internal outages) and bring it back, so in this scenario do we need to stop Distribution Cache service gracefully before doing server shutdown?



  42. Hi John,

    If there's just one server then a graceful shutdown by definition won't do anything as theres nothing to offload cache objects to. No worries though – the worst thatll happen is logon tokens will be lost & users will have to reauthenticate.


  43. Simon,

    Is SP reports the cluster is down, if I'm not mistaken that could mean a network issue amongst other things (not necessarily a problem, but maybe saturation for example). I'd set your max connections to 1 for all the cache-containers & see if that helps – maybe it's being overloaded, especially if you're using claims logins which will generate a lot of connections. Unfortunately though that's just one potential cause amongst many; I assume the timer-job works normally & this is a one-off now & then?

    // Sam

  44. Gerald S,

    When was the health warning generated? It's possible it's not been refreshed in a while maybe? You can always export the cache config & see if it appears anywhere in the XML file.

    // Sam

  45. Peter says:

    Thanks for the post really helped get my dev farm back in action