Cloud Search Service Application: Removing items from the Office 365 Search Index.

Microsoft announced the Preview of Cloud Hybrid Search to an audience at the Microsoft Ignite conference in Chicago during May of 2015, the channel 9 recording can be found here. After the announcement many Office 365 customers deployed cloud hybrid search into test and proof of concept platforms. When SharePoint Server 2016 was released to market in March 2016 the feature went into Mainstream Support and the number of customers deploying in real Production platforms started to rise.

Feedback for the feature on the whole was very good and the process of indexing content from on premises into the Office 365 search index was robust and reasonably well understood by the customers. One area was still considered a little murky however, and that was how to get items out of the Office 365 search index again. The aim of this post is to provide some clarity on that and to draw attention to a feature update added in the April 2016 CU for SharePoint 2013 that will also ship in the June 2016 update for SharePoint 2016.

In SharePoint 2013 and SharePoint 2016 items are deleted from the on-premises search index when they are deleted from the content that is being indexed, when the admin removes a start address from the content sources, or the admin completely removes a content source. The deletion happens in different ways and SharePoint uses crawl policies to dictate this process. Documentation on crawl policies can be found here

·         When a SharePoint item is flagged in the change log as deleted then the crawler will signal that deletion during a crawl and that ultimately leads to the item being removed from the search index.

·         When a non-SharePoint item is deleted, for example an item in a file share, this is picked up as an item not found by the next crawl of that content and eventually removed from the search index.

·         When a start address is removed from a content source or an entire content source is removed this triggers a different process, a delete crawl. The delete crawl will systematically remove all items from the search index that fall under the start address(es) being removed.

·         Finally, an index reset can remove items from the search index but this approach is none selective and results in a complete purge of the indexed items, and importantly also the crawl history from the crawl databases.

just like an on-premises only Search Service Application, the Cloud Search Service Application will send signals to the Office 365 search index to remove items from this index. The fourth process above, index reset is however a very different animal in the Cloud Search service application. If an admin selects index reset in the Cloud Search Service Application, the crawl history is purged from the crawl databases but no signal is sent to Office 365 to purge the items from the Office 365 search index. This will result in orphaned indexed items with no effective means of removal. Until the April 2016 cumulative update for SharePoint Server 2013 that is.

When we say no effective way of removing the orphaned search items, there were in fact two ways to accomplish this. First re-index everything on premises and after the indexing completes delete the content sources to trigger a delete crawl to run. Of course re-indexing everything is not efficient, it takes time and if items have been deleted from the on premises content you still run the risk of missing orphans in the O365 search index. Another option is to call Microsoft Office 365 support and raise a ticket to ask for an index purge, something that takes time and again is inefficient for the task at hand.

The message here from Microsoft is please, please do not ever click index reset on a cloud SSA. In fact, a new warning has been added to the index reset function for this exact reason.

hybridcloudssa

So, what has changed in the April 2016 CU to make us happier and give us control over this capability? Well first a new method has been added to the PushTenantManager, a component of the Cloud Search Service Application. The new method is DeleteAllCloudHybridSearchContent which when you think about it, speaks for itself.

Microsoft have helped us out even further though, not only have they implemented the method, they also provide a convenient script to help us use the method.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
<#
.SYNOPSIS
    Issue a call to SPO to delete all external content indexed through Cloud hybrid search. This operation is asynchronous.
.PARAMETER PortalUrl
    SharePoint Online portal URL, for example 'https://contoso.sharepoint.com'.
.PARAMETER Credential
    Logon credential for tenant admin. Will prompt for credential if not specified.
#>
param(
    [Parameter(Mandatory=$true, HelpMessage="SharePoint Online portal URL (PPE), for example https://contoso.spoppe.com.")]
    [ValidateNotNullOrEmpty()]
    [String] $PortalUrl,
    [Parameter(Mandatory=$false, HelpMessage="Logon credential for tenant admin. Will be prompted if not specified.")]
    [PSCredential] $Credential
)
$SP_VERSION = "15"
$regKey = Get-ItemProperty -Path "HKLM:\SOFTWARE\Microsoft\Office Server\15.0\Search" -ErrorAction SilentlyContinue
if ($regKey -eq $null) {
    $regKey = Get-ItemProperty -Path "HKLM:\SOFTWARE\Microsoft\Office Server\16.0\Search" -ErrorAction SilentlyContinue
    if ($regKey -eq $null) {
        throw "Unable to detect SharePoint installation."
    }
    $SP_VERSION = "16"
}
Add-Type -AssemblyName ("Microsoft.SharePoint.Client, Version=$SP_VERSION.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c")
Add-Type -AssemblyName ("Microsoft.SharePoint.Client.Search, Version=$SP_VERSION.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c")
Add-Type -AssemblyName ("Microsoft.SharePoint.Client.Runtime, Version=$SP_VERSION.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c")
if ($Credential -eq $null)
{
    $Credential = Get-Credential -Message "SPO tenant admin credential"
}
$context = New-Object Microsoft.SharePoint.Client.ClientContext($PortalUrl)
$spocred = New-Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($Credential.UserName, $Credential.Password)
$context.Credentials = $spocred
$manager = New-Object Microsoft.SharePoint.Client.Search.ContentPush.PushTenantManager $context
$task = $manager.DeleteAllCloudHybridSearchContent()
$context.ExecuteQuery()
Write-Host "Started delete task (id=$($task.Value))"

---------------------------------------------------------------------------------

So, what happens when we run this script?

In this case we are running the script and supplying the portal url on the cmdline. If the Portal Url is omitted, we will be prompted for it. Also the credential can be supplied, or as here the script will prompt for an Office 365 SharePoint Online Global Admin Account.

Deletecontact

After a valid credential is supplied then the script responds with a simple message.

Deletecontact1

Record this task ID as you may need it if calling Microsoft Support should the process for any reason fail. The task is asynchronous, that is, you can leave it to continue on running in the Office 365 Search Farm and it will eventually complete.

After this final step you will get no more feedback but you can track the effect of the task by running a search query for the managed property IsExternalContent=1 . The screen shots below were taken just before the purge, then a short time later and you can see the reduction in the estimated item count for the same query.

 

Query1

After some time, the same query revealed a different number of estimated items. The number is only an estimate but when you see the estimate steadily falling over time then you can rest assured that the deletion is underway. Ultimately resulting in no items to show in the Office 365 search index.

We have deliberately not tried to provide estimates or predictions for the time taken to purge a specific number of items from the Office 365 index because this will vary on a number of factors. Needless to say it will take as long as it takes.

Summary

So great news for people wanting the end to end control over their Office 365 search experiences. We can not only crawl what we want and include it in the Office 365 search index, but now we can remove items in a controlled manner too.

 

POST BY : Neil Hodgkinson (MSFT) and Manas Biswas (MSFT)