Beware crawling the non-Default zone for a SharePoint 2013 Web Application


Update: I’ve now published another post “Problems Crawling the non-Default zone *Explained” that goes on to explain the underlying behaviors that I warned about and described in this post…

—————————————

After playing for a while with SharePoint 2013 Search, I thought we were out of the woods regarding crawls of the non-Default Alternate Access Mapping (AAM) zone for a SharePoint Web Application. This caused all sorts of problems in earlier versions of SharePoint (primarily busted contextual scopes, broken social tagging, and workflow emails linking to the incorrect zone) because there is a built in assumption by other components throughout SharePoint that the Default zone is being crawled.

I’m still working to fully nail down the impacts for SP2013, but, from my initial testing [in SP2013], when crawling a non-Default URL, all search results will be relative to the URL crawled rather than the URL from which you query (and suspect it’s going to break scoping rules for queries as well), meaning you will get unexpected URLs when you query.

Update: I want to seriously caution against using Server Name Mappings, particularly in SharePoint 2013. Admittedly, with SharePoint 2010, Server Name Mappings did appear to provide a workaround. However, although they appear to work, Server Name Mappings were definitely not designed for this particular scenario.

Second, In SharePoint 2013, I know for certain that some managed properties (e.g. SPSiteUrl and ParentUrl to name two) in the Index absolutely do not get *updated by Server Name Mappings, so adding them will only make the problem worse!!! In other words, you’ll have some URL-based properties that are relative to one URL and other MPs relative to the mapped URL…

But because Server Name Mappings were not intended for this scenario, I would not have expectation that this should work in all cases.

For example, if I issued a query from some site in the Web Application http://initech, then I should expect all results from this Web Application to be returned relative to http://initech (as in http://initech/result1.aspx and http://initech/result2.aspx). However, if I were crawling the URL of a non-Default zone, then my results will all be returned relative to this non-Default URL (such as: http://bargainclownmart:88/sites/myTeam/result1.aspx and http://bargainclownmart:88/sites/myTeam/result2.aspx ).

Update: I recently published “Alternate Access Mappings (AAMs) *Explained” to provide more insights on AAMs and to better illustrate its often misunderstood concepts.

In this scenario below, I have two Web Applications with the following Alternate Access Mappings (as a side note, I believe Host Named site collections are now the preferred method over AAMs, but I wanted to demonstrate this as an example):

Internal URL Zone Public URL for Zone
http://sp-foo:88 Default http://sp-foo:88
http://testingfoo:88   Intranet http://testingfoo:88
http://bargainclownmart:88 Internet http://bargainclownmart:88
http://bargainclownmart.officespace.lab:88    Extranet      http://bargainclownmart.officespace.lab:88   
 http://faceman  Default  http://faceman 
 http://initech  Intranet  http://initech  
 http://initech.officespace.lab Internet  http://initech.officespace.lab

 

Observed behaviors when crawling the Default URLs…

In my content source, I specify http://faceman and http://sp-foo:88 as the start addresses and then perform a full crawl.

As expected, the URL for results is relative to the URL from which the query is performed. For example, notice the URL in the browser’s address navigation bar shows http://sp-foo:88 and the results for this Web Application are also displayed relative to this same http://sp-foo:88 URL:

Results related to another Web App would also be relative to this zone (which to knowledge is new to SP2013). For example, if I query from the http://initech URL (in other words, from the Intranet zone), then all results related to this Web App would be relative to the http://initech URL (such as http://initech/result1.aspx, http://initech/result2.aspx, etc…) as seen in the last two results in the screen shot below…

 

For comparison, observed behaviors when crawling the non-Default URLs…

In my content source, I then specify http://faceman and the Internet zone http://bargainclownmart:88 as the start addresses and then perform a full crawl.

For my queries from any zone for any Web App, the search results related to the http://sp-foo:88 Web App will always return relative to the URL that was crawled… in this case http://bargainclownmart:88. In other words…

 

The moral to this story…

Always crawl the default URL (*the URL being crawled must be a Windows Authenticated zone) unless there is a REALLY good reason otherwise.

 

Comments (8)

  1. Andrew says:

    I also saw that results returned from a REST search query, always returns the default zone url, even if you are in the same web app (but different zone). This effects Content Search Web Part results.

  2. Matt S. says:

    You can also use server name mappings in Central Administration.  Although it is not advisable to use those in conjunction with alternate access mappings– which may be what the scope of this article covers.  

    I had a situation where my results were http://<server name>:8847 and I wanted https://site.domain.com to appear in results and didn't need zone-specific paths.  Changing the server name mapping to https://site.domain.com worked for me.  Results may vary depending on each environment.  

    Thanks for the post, very informative.

  3. Beat Nideröst says:

    Hello bspender,

    Thank you for your blog article. Are you aware of the "Server Name Mappings" settings in "Search Administration" in SharePoint "Central Admin"? The description of the settings says:

    "Create server name mappings to override how URLs are shown in search results. Server name mappings are typically needed when the URLs used by the crawler to access content are different than the URLs which users use to navigate to the same files."

    See the following URL for an example of how to configure SharePoint on a non-default zone and configuring the Server Name Mappings to prevent erroneous URL's in the search results:

    sharepointobservations.wordpress.com/…/sharepoint-2013-configuring-search-to-crawl-web-applications-using-claims-and-adfs-2-0

    Might this solve the issues you experienced?

    Regards, Beat Nideröst

  4. bspender says:

    I want to seriously caution against using Server Name Mappings, particularly in SharePoint 2013.

    Admittedly, with SharePoint 2010, Server Name Mappings did appear to provide a workaround. However, although they appear to work, Server Name Mappings were definitely not designed for this particular scenario.

    Second, In SharePoint 2013, I know for certain that some managed properties (e.g. SPSiteUrl and ParentUrl to name two) in the Index absolutely do not get *updated by Server Name Mappings, so adding them will only make the problem worse!!! In other words, you'll have some URL-based properties that are relative to one URL and other MPs relative to the mapped URL…

    But because Server Name Mappings were not intended for this scenario, I would not have expectation that this should work in all cases.

  5. Bart Kapitein says:

    When following the SharePoint 2013 Design Samples (technet.microsoft.com/…/cc261995.aspx) the extranet sample doesn't crawl on the default zone.

    We have a similar configuration. The default zone is the "default" zone. All users access that zone and it's configured for SAML authentication. We have an intranet zone with NTLM for crawling. We cannot switch the zone's because administrative emails sent from SharePoint would contain the URL of the wrong zone. (as described in the design samples). We use server name mappings to fix the URL's in the search index.

    How can we configure the zones correctly for search and keep the correct URL's in administrative e-mails?

  6. bspender says:

    Hi Brad – just now seeing this comment, so apologies for delayed response.

    This is admittedly a scenario that I don't have a blanket "do this". If you don't crawl the default zone, I am certain that aspects of Search won't function as expected as noted above. I've also previously reached out to the content owners of that TechNet article that you referenced and noted my concerns  

    Without deep diving here, you'd generally want to configure both Authentication providers in the same zone (e.g. both SAML and Windows NTLM). Then, in Central Administration -> Web Application -> Authentication providers, set the “Sign In Page Url” to a custom login page (e.g. the relative URL used as the default login page for FBA like /_forms/default.aspx) …then verify the crawl can access the default URL (using NTLM).

    …For full disclosure, I haven't actually implemented this specifically as a workaround, but have heard others report that this works. Being said, I would test and verify before just trusting me 🙂

  7. Omar S says:

    Have you ever experienced a problem where a Search Application was crawling the wrong web-app entirely due to AAM ?

    I have a public site web-app and an internal site web-app as well as corresponding Search Application / Content Sources for each.

    The Internal Site Search Application is crawling the Site Columns ( Properties ) from the Public Site. In the Public Site Search Application, the crawled properties are completely unavailable.

  8. Riz says:

    Hi bspender,

    Is crawling of  https sites in default zone fully recommended ? I am going to create  1 web application with https and that by  default , place under 'Default zone' . So do i see any issues with crawling ? Please suggest .

    Best Regards,

    Riz