Problems Crawling the non-Default zone *Explained

Article
07/08/2014

In a nutshell, there is an undocumented assumption baked into SharePoint Search that the Default Public URL of a Web Application will be crawled. If you want everything to work auto-magically, crawl the Web Application's Public URL for the Default zone (*Note: the crawler requires Windows Authentication [NTLM or Kerberos] in whatever zone your crawl …meaning your Default zone should include Windows Authentication). Otherwise, when crawling a non-Default zone, expect things to break such as contextual queries like "this site" and "this list" as well as inconsistent URLs returned in query results as demonstrated in my first "Beware crawling the non-Default zone…" post.

For reference, please see the following where I have written extensively on this topic:

Alternate Access Mappings (AAMs) *Explained
Beware crawling the non-Default zone for a SharePoint 2013 Web Application
And discussed at SPC14 [#SPC375] (at the 58 minute mark)

Explaining the Assumption of the Default zone

If configured correctly (e.g. crawling the Default zone), we would expect all search results from the same Web Application to be relative to URL from which they were queried. For example, assume the following Web Application and Alternate Access Mappings (AAMs):

In this case, a query from the Default zone would render results relative to the Public URL of this zone https://usersUrl, such as:

Or, if making the same query from the Extranet zone, the same results would now be relative to https://externalUsers such as:

…and so on for any of the five zones for this Web Application.

This behavior above occurs because the Query Processing component uses the Alternate Access Mappings to map the URLs in Search results to match the zone from which the query is made. For this behavior to work, it predicates that the QPC assume URL-based properties in the Search Index correspond with the Default Public URLs from each Web Application.

If the URLs in the Search Index match the Default Public URL for a Web Application, the URLs in results can be properly re-mapped to the current zone as above
However, if the URLs in the Search Index DO NOT match the Default Public URL for any Web Application in the AAMs, then no re-mapping occurs (spoiler alert: this explains the behavior in my "Beware crawling the non-Default zone…" post, which I'll explain further below)

Crawl the Public URL of the Default Zone

There is an oft referenced TechNet for "Plan for user authentication…", which notes "If NTLM authentication is not configured on the default zone, the crawl component can use a different zone that is configured to use NTLM authentication". Although it isn't technically unsupported to crawl a non-Default zone, it might as well should be – when doing so, Search won't behave as expected as noted above and further explained below. Being said, in my opinion, this statement from the Plan for user authentication article is misguided at best and will only lead you to frustration.

In its place, we helped get this written (by the Search folks rather than the Authentication folks), so I would defer to this guidance instead:

Best practices for crawling in SharePoint Server 2013

Particularly, the section: Crawl the default zone of SharePoint web applications

The most common reasoning I hear for crawling the non-Default zone typically (and reasonably) stems from a business need to use a custom claims provider (e.g. SAML/ADFS) as the Authentication method used in the Default zone, but also needing to make Windows Authentication available in another zone to facilitate the SharePoint Search crawl component.

If you are unable to configure in a way where the Default zone uses Windows Authentication and the extended zone uses your custom claims provider, then the only viable option entails:

Configure BOTH a custom claims provider and Windows Authentication (e.g. NTLM) in the Default zone of the Web Application
And implement a custom login form such as the one by Russ Maxwell here (SP2013) or the one by Steve Peschka here (SP2010)
- Without the custom login form, the crawler won't know how to pick which authentication method to use and crawls will fail
- Depending on your implementation, you may also need to change to a "Custom Sign In Page" for the Web Application

Updated (July 2016) Avoid Heads up when using "/_trust/default.aspx?trust=name-of-your-trusted-identity-provider" as the Custom Sign in Page as this will break Office integration and users will not be redirected properly for authentication when opening client application.* I originally said "Avoid using..." and updated this to "heads up when using..." because it will break the Office integration. However, there is a VERY easy workaround of simply logging in with the browser before attempting to open files with the Office clients. Alternatively, in the comments, someone else noted the use of a registry key setting (e.g. "RefreshFormBasedAuthCookie") as another potential workaround (but that one is outside of my wheelhouse, so I'll defer to the KB for any further details).

Impacts of crawling the non-Default zone

For example, assume the same Web Application and AAMs as above. If the SharePoint Search Content Source has https://non-default-url as the start address (which is not advised, but assumed for this example to prove a point), then URL-based Managed Properties (MPs) in the index will now all [incorrectly] be relative to this https://non-default-url as illustrated below (spoiler alert: this is why contextual scope queries break… but I'll illustrate that further down)

As noted above, the QPC assumes URLs correlate with the Default Public URLs - if the URL-based MPs in the Search Index DO NOT match the Default Public URL for any Web Application in the AAMs, then no re-mapping can occur. In this new scenario (*assuming a full crawl occurred since changing the Content Source with this new start address), we can demonstrate this behavior when the user now browses https://externalUsers and issues the same query as before.

For this, the URL-based MPs in the Search Index all contain values relative to https://non-default-url, which does not match the Default Public URL. Therefore, the QPC cannot correctly re-mapped search results with the URL of the current zone (e.g. https://externalUsers). Instead, the results will all remain relative to the URL that was crawled (which is the value in the Index) as below:

In my "Beware crawling the non-Default zone…" post, I provide these scenarios in further detail. The takeaway should simply be that the results show the incorrect URL root because that is the URL that is in the Search Index when crawling a non-Default zone.

This goes back to the undocumented assumption baked into SharePoint Search that the Default Public URL of a Web Application will be crawled. Because, in this example scenario, the non-Default URL was crawled, this built-in assumption is not valid and thus the results cannot be mapped back to the appropriate zone when rendering of query results occurs.

For similar reasons, this also begins to explain why contextually scoped queries such as "this site" or "this list" return no results if crawling the non-Default zone.

Avoid Server Name Mappings

Another misconception I hear is to simply use Server Name Mappings to overcome the problems that manifest by crawling a non-Default zone. However, Server Name Mappings are not intended for SharePoint content and using them will cause you more grief (e.g. not all Managed Properties adhere to this mapping, so you'll end up with inconsistencies in URL-based MPs, which also breaks OWA), especially in SP2013 …it isn't technically unsupported to use them, but it too might as well be (e.g. it's going to break things in an expected manner and you'll likely never get a fix for any of these because Server Name Mappings are not intended for this purpose).

However, when Server Name Mappings get introduced, the problem only gets worse (particularly for SP2013). Because not all Managed Properties *adhere to this mapping, you will end up with inconsistencies in URL-based MPs such as below (*Server Name Mappings aren't intended for SharePoint content, so these MPs do not adhere to the mappings effectively by design)

It's worth pointing out that the ServerRedirectedUrl Managed Property is one of these MPs that do not get re-mapped. With both FAST Search for SP2010 and SharePoint 2013, this MP holds the "URL used in association with [Office Web Apps (OWA)]. This enables you to provide thumbnail previews in the result. " By impacting this MP, expect the document previews in the search results to break.

Similarly, the SPSiteURL, which holds the URL of the SharePoint site where the matching item is located – and commonly used for refinement, does not get re-mapped. For example, assume the user browses to Default zone and attempts to refine by the Site Collection. In this case, there will be no matches because https://usersUrl (the Default Public URL) does not match https://nonDefaultUrl (the value in the SPSiteURL MP in the Search Index).

As a side note:
The SharePoint 2013 Search Query Tool on CodePlex is a great way to see the inconsistency of URL-based properties when Server Name Mappings have been configured. To knowledge, this impacts at least the following managed properties such as OriginalPath, Path, SitePath, ParentLink, ServerRedirectedUrl, and SPSiteUrl...

Why this Breaks Contextual Queries

Let's continue to assume the earlier [ill-advised] configuration with https://non-default-url as the start address in the SharePoint Search Content Source (and, for simplicity, assume no Server Name Mappings are configured – although it would not make much difference for the examples below). Now, the user navigates to https://externalUsers and issues a query using "This Site" (which is the default Result Source for a SharePoint Team Site search box).

In this case, the request is submitted to https://externalUsers/_layouts/15/osssearchresults.aspx?u= http%3A%2F%2Fexternalusers&k= bowling%20pins, which fires the query. In this URL, we see two noteworthy parameters (which I've URL decoded for an easier read):

u=https://externalusers (which provides a Url context for the "This Site" part of the query)
k=bowling pins (which defines the Keyword being queried)

With contextually scoped queries (e.g. "this site" or "this list"), the Query Processing component uses the AAMs to re-map the URL in the u parameter BEFORE submitting the query to the Index. Here again, we have evidence of the baked in assumption that the Default URL is expected in the Search Index because the QPC specifically maps this u parameter value …wait for it… to the Default Public URL.

To continue this example, the u parameter value gets remapped from https://externalusers to https://usersurl (e.g. the Default zone) BEFORE submitting the query to the Search Index as below:

However, because the Search Index only has MPs with https://non-default-url, this contextually scoped query will find zero results because nothing matches https://usersurl (which is the result of the QPC converting the URL passed in with the u parameter at query time into the Default Public URL). Whereas, by simply crawling the Default Public URL, the underlying assumption (e.g. that the Default zone got crawled) remains valid and allows for the contextual query to properly limit results by the appropriate site URL regardless of the originating zone from which the query is performed.

Avoid Manually Modifying Public URLs in AAMs

Another pitfall I see is direct modification of the Public URLs in the Alternate Access Mappings (*Note: You can pretty much add or remove Internal URLs all day long – the potential trouble occurs when modifying the Public URLs for a Web Application).

When extending a web application, two things occur:

The new AAM for this new zone is created
…and, more importantly, a corresponding IIS Site is created for this new zone

However, when just creating the AAM by only using the "Mange AAMs" page in Central Admin, the corresponding IIS Site doesn't get created. Keep in mind that the web.config and site bindings live in the IIS site, so it's actually creating an ambiguous scenario for SharePoint and more typically, for any customizations [that won't be properly] registered with the existing web.config. This is why SharePoint reports a warning message in ULS if you create the new zone by simply modifying Public URLs rather than extending the Web Application.

Being said, you technically CAN create a new zone by modifying the AAMs from Central Admin, but I advise against it in almost all cases (personally, I wish a giant warning could be added to the "Edit Public URLs" page from Central Admin if not removed outright). If you do change Public URLs directly, just make sure you handle all the other manual pieces (e.g. web.config and IIS bindings) and account for this when enabling the Web Application service on another server in the Farm.

Summary

I probably sound like a broken record at this point, but I have to reiterate this again. There is an undocumented assumption baked into SharePoint Search that the Default Public URL of a Web Application will be crawled. By not crawling the Default URL, you will see the unexpected behaviors described below:

Contextual scopes (like "this site" and "this list") will break and not return any results
Cross-Web App queries will not map to the correct zone (this is what's being described in the post below stating "when I visit the search center from intranet zone, the items returned in search result page have URL for intranet zone corresponding to their specific web application also"), and I have described this same behavior here [SPC] and here [blog].
Similarly, if you use Server Name Mappings with SharePoint content (particularly in SharePoint 2013), the results will be further impacted in a negative way:
- Server Name Mappings were not intended/designed for SharePoint content, so they are not expected to work
- Server Name Mappings do not re-map all URL based Managed Properties in the Search Index (see the first bullet point ...again, not intended for this), so using Server Name Mappings will actually result in an inconsistency in your Search Index
- Office Web Apps will break because the ServerRedirectedUrl MP does not get re-mapped, so the iFrame used to render the OWA part will actually reference the URL root that was crawled rather than the default URL

I hope this helps…