SharePoint 2013: Crawl does not happen with error m_DocumentFeeder->Init failed with 0x80131537


Worked with a customer on SP 2013 Farm crawling SP 2010 sites and also crawling content from Open Text. They had some database connectivity issues in the weekend. Post this, you seeing that none of the crawls are progressing. The status is "Crawling Full" but the success/error count is 0.

The crawl on any of the Content Sources run for at least an hour approximately without any successes.

Symptoms:

Topology with 7 servers with 4 content processing components, and an existing index of say 7 million items, there would be a clear need for freshness of searchable items. The issue would be that the existing items are searchable without issues and further crawls don’t work

 

 

clip_image002

Crawl rate: The foremost indication is the crawl rate remaining at zero items throughout the crawl period. This should immediately catch your attention when we start a crawl on any content source.

clip_image004

Second symptom is the Services restart: Any admin would know that the second step would be to stop host controller and osearch. And start osearch [which basically brings back host controller as well]. This doesn’t make the crawl successful either.

Third Symptom is to create a new ntlm based web app and add a content source on the new one and test the behavior. Since the content access account needs to be a domain account, the crawler should be able to get contents from the new web app. But in case this fails on the new web app/ content source as well, the inference is that the issue is not with the SSA itself but at the components of the topology.

A quick execute of the following command should get you their current server locations for those components.

(Get-SPEnterpriseSearchServiceApplication | Get-SPEnterpriseSearchTopology).GetComponents() | select Name, servername

Name                     

ServerName

—-                     

———-

CrawlComponent0                                       

Server3

CrawlComponent1

Server4

CrawlComponent2

Server5

CrawlComponent3

Server6

IndexComponent2

Server7

IndexComponent1          

Server1

IndexComponent2

Server2

AdminComponent1          

Server3

AdminComponent2

Server4

ContentProcessingComponent1                           

Server3

ContentProcessingComponent2                           

Server4

ContentProcessingComponent3                           

Server5

ContentProcessingComponent4                           

Server6

AnalyticsProcessingComponent0                         

Server3

AnalyticsProcessingComponent1                         

Server4

AnalyticsProcessingComponent2                         

Server5

AnalyticsProcessingComponent3                         

Server6

QueryProcessingComponent0

Server1

QueryProcessingComponent1

Server2

QueryProcessingComponent2

Server7

In the ULS CSFeedersManager will initialize the Content processing components after test verifying Cgatherer object with a PingCrawl over event ID 0x57D0. On the same event ID sequence for 0x57D0 we would see the message m_DocumentFeeder->Init failed with 0x80131537  .

08/18/2015 19:52:31.77      mssearch.exe (0xBFDC)     0x57D0   SharePoint Server Search  Crawler:Gatherer Plugin   e5bn     Verbose  CGatherer::PingCrawl 2, component 938dfd17-de71-4c33-8c25-c54075b96d00-crawl-1 [gatherobj.cxx:2492] search\native\gather\server\gatherobj.cxx   

08/18/2015 19:52:31.77      mssearch.exe (0xBFDC)     0x57D0   SharePoint Server Search  Crawler:Content Plugin    af7x6    High     CSSFeedersManager::Init: addresses = net.tcp://Server3/AF3440/ContentProcessingComponent1/ContentSubmissionServices/content,

net.tcp://Server4/AF3440/ContentProcessingComponent2/ContentSubmissionServices/content,

net.tcp:///AF3440/ContentProcessingComponent3/ContentSubmissionServices/content,net.tcp:///AF3440/ContentProcessingComponent4/

ContentSubmissionServices/content     

08/18/2015 19:52:31.77      mssearch.exe (0xBFDC)     0x57D0   SharePoint Server Search  Crawler:Content Plugin    ab3jl    High     m_DocumentFeeder->Init failed with 0x80131537                                   [contentpiobj.cxx:386] search\native\gather\plugins\contentpi\contentpiobj.cxx         

In our case, the list of content processing components would look peculiar. As per the previous list we know that we have to see Content processing components 1 and 2 on Servers 3 and 4 and Content processing components 3 and 4 on Servers 5 and 6. But we can see that the server names for Server5 and Server6 are not present in the net.tcp addresses.

For instance we are seeing net.tcp:///AF3440/ContentProcessingComponent3 instead of net.tcp://Server5/AF3440/ContentProcessingComponent3

This is the litmus test and this is telling us that the server names for components in the active topology are not populated for some reason.

We can also confirm this via a simple process monitor trace filtered on mssearch.exe. The RegQuery would happen on following registry key

HKLM\SOFTWARE\Microsoft\OfficeServer\15.0\Search\Applications\<SearchServiceAppGuid>\CatalogNames\FastConnector:ContentDistributor   

6:48:27.6165187 AM    mssearch.exe    49116   RegCloseKey   HKLM\SOFTWARE\Microsoft\Office Server\15.0\Search\Applications\938dfd17-de71-4c33-8c25-c54075b96d00    SUCCESS  

6:48:27.6165515 AM    mssearch.exe    49116   RegCloseKey   HKLM\SOFTWARE\Microsoft\Office Server\15.0\Search\Applications\938dfd17-de71-4c33-8c25-c54075b96d00    SUCCESS  

6:48:27.6165843 AM    mssearch.exe    49116   RegCloseKey   HKLM\SOFTWARE\Microsoft\Office Server\15.0\Search\Applications    SUCCESS  

6:48:27.6166170 AM    mssearch.exe    49116   RegQueryValue   HKLM\SOFTWARE\Microsoft\Office Server\15.0\Search\Applications\938dfd17-de71-4c33-8c25-

6:48:27.6167305 AM    mssearch.exe    49116   RegQueryValue   HKLM\SOFTWARE\Microsoft\Office Server\15.0\Search\Applications\938dfd17-de71-4c33-8c25-c54075b96d00\CatalogNames\FastConnector:ContentDistributor    SUCCESS   Type: REG_SZ, Length: 668, Data:

 

net.tcp://Server3/AF3440/ContentProcessingComponent1/ContentSubmissionServices/content,

net.tcp://Server4/AF3440/ContentProcessingComponent2/ContentSubmissionServices/content,

net.tcp:///AF3440/ContentProcessingComponent3/ContentSubmissionServices/content,

net.tcp:///AF3440/ContentProcessingComponent4/ContentSubmissionServices/content

 

As can be seen we would not have the server names there as well.

 

Solution

In Sharepoint 2013 we know that this value comes over from the Admin SSA database from the MSSConfiguration table if we look for the rows having net.tcp switch

SELECT [Name]

      ,[Value]

     ,[LastModified]

  FROM [SSA_AdminDB].[dbo].[MSSConfiguration]

  where convert(nvarchar, Value ) like ‘%net.tcp%’

It is expected that we get 2 rows one for the content processing component and one for the Query component and these values should basically coincide with what is present on the registry entries mentioned above. 

 

Name

Value

938dfd17-de71-4c33-8c25-c54075b96d00\CatalogNames\FastConnector:ContentDistributor

net.tcp://Server3/AF3440/ContentProcessingComponent1/ContentSubmissionServices/

content,net.tcp://Server4/AF3440/ContentProcessingComponent2/ContentSubmissionServices/

content,net.tcp:///AF3440/ContentProcessingComponent3/ContentSubmissionServices/content,

net.tcp:///AF3440/ContentProcessingComponent4/ContentSubmissionServices/content

ImsQueryInternalUri

net.tcp://Server1/AF3440/QueryProcessingComponent1/ImsQueryInternal;

net.tcp://Server2/AF3440/QueryProcessingComponent2/ImsQueryInternal;

net.tcp://Server7/AF3440/QueryProcessingComponent3/ImsQueryInternal;

 

Here is where you can take two routes

Scenario: 1

In the first case, the values are alright on the SQL side and still the registry keys are messed up, then we know that the polling Application server admin service timer job that executes once every minute did not update it. The next step would be to check in the ULS if this job ran on the affected servers.

Time

Process

EventID

Level

Message

Correlation

8/18/2015 19:53

OWSTIMER.EXE (0x7484)

xmnv

Medium

Name=Timer Job job-application-server-admin-service

8256259d-c875-7067-44f2-e6e7a47a74fa

8/18/2015 19:54

OWSTIMER.EXE (0x7484)

xmnv

Medium

Name=Timer Job job-application-server-admin-service

9156259d-8828-7067-44f2-ec6295432737

8/18/2015 19:55

OWSTIMER.EXE (0x7484)

xmnv

Medium

Name=Timer Job job-application-server-admin-service

9f56259d-28b8-7067-44f2-e03e8ba63a84

8/18/2015 19:56

OWSTIMER.EXE (0x7484)

xmnv

Medium

Name=Timer Job job-application-server-admin-service

ae56259d-e86a-7067-44f2-e54caca2ccd5

8/18/2015 19:57

OWSTIMER.EXE (0x7484)

xmnv

Medium

Name=Timer Job job-application-server-admin-service

bd56259d-a81b-7067-44f2-ef63668b17b7

 

We can also alternatively take several corrective measures like checking out if the timer service instances are running on these servers via (Get-SpFarm).TimerService.Instances

We can also try restarting the timer service on these boxes or clearing config cache on these boxes via http://blogs.msdn.com/b/josrod/archive/2007/12/12/clear-the-sharepoint-configuration-cache-for-timer-job-and-psconfig-errors.aspx and take it from there. We might have to just make that job run and things would be fixed.

Scenario 2:

In our second case, the values in the SQL table are also messed up then these components are never usable and crawl would never run since the ping to these components would always fail with the message m_DocumentFeeder->Init failed with 0x80131537 

This means that we have to modify the topology; remove and then recreate the components on the same machines again and activate the topology again

Please note that modifying the topology is a very CPU intensive operation which will ensure down time on search and query. Also we have done a clone of the topology on exactly the same computers. In case topology is activated with a move over of index components, then there is a good chance of the copy over to either take a really long time or fail and cause corruption of index.

Although topology clone and change over is doable and there are methods to do this; we don’t suggest this. In our case the content processing component is the least impact in terms of removing and adding them back. In case the issue was with an index component this might not be the way out.

Modifying current topology and removing and adding back the components can be achieved by the following script.

$ssa = Get-SPEnterpriseSearchServiceApplication

$active = $ssa.ActiveTopology 

$clone = $active.Clone()

 

$cpc4_old = $clone.GetComponents() | ?{$_.name -match ‘contentprocessingcomponent4’}  #Server4

$cpc3_old = $clone.GetComponents() | ?{$_.name -match ‘contentprocessingcomponent3’}   #Server3

$cpc2_old = $clone.GetComponents() | ?{$_.name -match ‘contentprocessingcomponent2’}  #Server2

$cpc1_old = $clone.GetComponents() | ?{$_.name -match ‘contentprocessingcomponent1’}  #Server1

 

$Server4= Get-SPEnterpriseSearchServiceInstance -Identity Server4

$Server3 = Get-SPEnterpriseSearchServiceInstance -Identity Server3

$Server2 = Get-SPEnterpriseSearchServiceInstance -Identity Server2

$Server1 = Get-SPEnterpriseSearchServiceInstance -Identity Server1

 

$clone.RemoveComponent($cpc4_old)

$clone.RemoveComponent($cpc3_old)

$clone.RemoveComponent($cpc2_old)

$clone.RemoveComponent($cpc1_old)

$clone.GetComponents()

 

$cpc4_new = New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $clone -SearchServiceInstance $Server4

$cpc3_new = New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $clone -SearchServiceInstance $Server3

$cpc2_new = New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $clone -SearchServiceInstance $Server2

$cpc1_new = New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $clone -SearchServiceInstance $Server1

 

$clone.GetComponents() | ?{$_.name -match ‘contentprocessingcomponent’}

$clone.Activate()

Once we have modified the topology we can go back and verify the current values again in the MSSCONFIGURATION table. In our case we were able to change back

Name

Value

938dfd17-de71-4c33-8c25-c54075b96d00\CatalogNames\FastConnector:ContentDistributor

net.tcp://Server4/AF3440/ContentProcessingComponent5/ContentSubmissionServices/content,

net.tcp://Server3/AF3440/ContentProcessingComponent6/ContentSubmissionServices/content,

net.tcp://Server2/AF3440/ContentProcessingComponent7/ContentSubmissionServices/content,

net.tcp://Server1/AF3440/ContentProcessingComponent8/ContentSubmissionServices/content

ImsQueryInternalUri

net.tcp://Server1/AF3440/QueryProcessingComponent1/ImsQueryInternal;

net.tcp://Server2/AF3440/QueryProcessingComponent2/ImsQueryInternal;

net.tcp://Server7/AF3440/QueryProcessingComponent3/ImsQueryInternal;

We should be able to see the same similarity on the registry level as well. We can provision SSA back again using (Get-SPSearchServiceApplication).provision() for the changes to line in.

Crawls can be restarted and they will work now with many successes.

Post By : Ramanathan Rajamani  [MSFT]


Comments (4)

  1. Tomasz Stasiuk says:

    Very interested but I em now so I got plenty mistakes

  2. Mohammed Sarfraaz says:

    Very helpfull. we had similar issue after updating march CU2016.

  3. Trobo says:

    Same scenario when we swap servers in the farm. Scenario 2 working like a charm in that case, very helpful.

  4. Krishna says:

    Very useful. I had similar situation, full crawl ran for 20 hours and continues without a single success/failure/error. And “x” error was showing under “Content Processing” component in search topology section. Restarting the “SharePoint Search Host Controller” and “SharePoint Server Search 15” services from “Services” console made things back running up. The error icon disappeared and “tick” mark appeared (after 1-2 minutes). I then started full crawl, items started crawling. Thank you.

Skip to main content