Configuring SharePoint 2010 search for crawling host-name site collection tenants


 

I’ve been playing with SharePoint 2010 multi-tenancy in the past 2 weeks and I had some issue setting up search so that it can crawl tenants properly.  The setup is the following:

  • Services farm with Search, Managed Metadata Services, and User Profile
  • Hosting farm with tenants for :
    • Contoso
    • Adventure Works
    • Woodgrove
  • Active directory :
    • Customers OU
    • an OU for each tenant under the Customers OU
  • Single Web Application hosting all tenants with host-name site collection.  Update: The web application was configured as Windows (NTLM) authentication.
    • Note: With 2010, it’s now possible to have more flexible managed paths.  it allows you to have an explicit path for /admin, /cthub, and /mysites/personal for each tenants with simply 4 managed paths (total!)
  • URLs (okay, I know, they could use some work for real world!!):
  • Tested with SharePoint 2010 RTM

 

To learn about multi-tenancy and how to set it up, which I hope you have done if you are reading this blog post, read Spence Harbar’s excellent post series here: http://www.harbar.net/archive/2010/09/14/rational-guide-to-multi-tenancy-with-sharepoint-2010-part-six.aspx.

 

Getting back to this post’s subject, you typically have a single search crawl account that has (automatic) access to all web applications through a user policy providing Full Read access.  This ensures a simple and efficient to all site collections of all web applications.  Now since I have 2 different farms, I had to manually give access to my search crawl account (contoso\sp_search) to my web application on the hosting farm.

 

The problem can first be seen when you try to log in to one of the tenant, say http://adventureworks.contoso.local.  If you try to log in as the search crawl account, you’ll get an access denied.  The reason for this is due when we create the site and connect it to UPS:

$upaProxy = Get-SPServiceApplicationProxy | where-object {$_.DisplayName –eq <<ProxyName>>}
Add-SPSiteSubscriptionProfileConfig -id $sub –SynchronizationOU “AdventureWorks” –MySiteHostLocation 
"http://adventureworks.contoso.local/mysites" -MySiteManagedPath "/mysites/personal"
-SiteNamingConflictResolution "None" -ProfileServiceApplicationProxy $upaProxy

Edit: the command that restricts the account was the previous one when you provision tenant (in Spence’s blog): Set-SPSiteSubscriptionConfig –id $sub -FeaturePack $customerFeatures -UserAccountDirectoryPath “OU=$customerName,OU=Customers,DC=contoso,DC=local”.  Thanks Spence 😉

The SynchronizationOU property, which really takes an OU name and not its distinguishedName (as it should) UserAccountDirectoryPath, specifies where the users of this site reside in Active Directory – effectively enforcing who has access to the site with the exception of the farm administrators.  This allow a partition when assigning permissions so that you cannot give a Woodgrove account access to an AdventureWorks site for example – they won’t even see them.

 

Since giving Full Read to a single account doesn’t work, the only fix that I found is the following (UPDATE: SEE NEXT SECTION)

  • Active Directory:
    • Create an Active Directory group for “Tenant Search Crawls”
    • Create a search crawl account for each tenant in their respective OU
    • Add each of these search crawl account in the Active Directory group
  • Services farm:
    • Create a user web application policy providing Full Read to the new Active Directory group
    • In search administration, create a crawl rule for each tenant similar to:
      • http://adventureworks.contoso.local, includes in crawl, specify account to <<its respective search crawl account>>
      • Note: you can use the PowerShell command New-SPEnterpriseSearchCrawlRule to create this rule through script when adding a tenant – however, you’ll either have to run a remote command in PowerShell, or execute this portion on the services farm.  If your tenants are in the same farm as the services, then you can automate this easily.

Update and new/better solution: I was told that this was working for Office 365 so I started digging deeper on the differences we might have between the two.  The 2 that came to mind were the fact that I have a remote farm (not really typical for a tenant scenario), and the 2nd was that I’m was using NTLM for the tenants web application.  As it turns out, switching the web application to claims (read: http://blogs.technet.com/b/speschka/archive/2010/07/20/migrating-from-windows-classic-auth-to-windows-claims-auth-in-sharepoint-2010-part-2.aspx) will fix the issue and the Web Application policies will now work properly ==> i.e. the search crawl account, which has Full Read from a Web Application User Policy.

If you absolutely need to do Windows Authentication (don’t know why, and see my upcoming blog post, you’ll get another issue there), then you’ll need the first solution.

 

Crawling will now work properly.

Comments (4)

  1. Hi Maxime,

    I have a multi-tenant environment (SharePoint 2010 ). Service farm and application farm. Claims authentication. Different tenants are hosted by one web app.

    I would like to crawl each tenant independently. I just create a new content source and add the host URL of one tenant.

    After having started a full crawl I get 0 success and 1 warning : “This URL is part of a host header SharePoint deployment and the search application is not configured to crawl individual host header sites. This will be crawled as a part of the host header Web application if configured as a start address.”

    If I setup a content source to crawl the all shared environment, the crawl is done successfully. (URL added in the content source settings = URL of the web app hosting the multi tenant / Crawl everything under the hostname for each start address has been selected when creating this content source.)

    There is a way to setup a crawl/content source by tenant?

    Thanks in advance

    Regards

    Baldo

  2. Hi Baldo,

    not to my knowledge, you set it up to crawl the whole web application.  In your situation, you need to have a different Schedule for a specific tenant?

    Maxime

  3. Hi Maxime,

    Yes, we need to have different schedule for a specific tenant.

    Baldo

  4. it's by design as far as I know but I haven't played with this in a while.  Otherwise, you'd have to use the other parameter and create a content source that explicitely détails all addresses it needs to crawl (i.e.: 1 by 1, not the root address), and create another one for your special tenant that will have a different Schedule.  I don't have a multi-tenant setup close-by to test it though.

Skip to main content