How does MOSS Search/Index works?

Article
09/18/2010

This blog is “Part One” of the upcoming series of blogs that would describe the high level architecture of the search query mechanism in MOSS 2007. The purpose of this blog is to explain to the reader what really happens in the background when a user performs a SharePoint search.

Below is the diagram that gives you a pictorial vision of the high level communication that happens between IIS, MSSearch.exe, the Search DB and the Full-Text Index.

Below is a breakdown of the 6 different stages.

Stage 1:

First thing that you need is the search query. The WebPart or the Web Service helps build up a search query. This search query is then passed to the Query Object Model (OM) via the WebPart or the Web Service.

This step happens between the client machine that submits the query and the web front end. Network traffic would show the following GET and the response:

Request:

GET /searchcenter/Pages/Results.aspx?k=sharepoint&s=All%20Sites HTTP/1.1Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*Accept-Language: en-usUA-CPU: x86Accept-Encoding: gzip, deflateUser-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)Host: wfeConnection: Keep-AliveAuthorization: NTLM TlRMTVNTUAADAAAAGAAYAHQAAAAYABgAjAAAAA4ADgBIAAAAGgAaAFYAAAAEAAQAcAAA

Response:

HTTP/1.1 200 OKCache-Control: private, max-age=0Content-Length: 76654Content-Type: text/html; charset=utf-8Expires: Tue, 08 Dec 2009 22:33:34 GMTLast-Modified: Wed, 23 Dec 2009 22:33:34 GMTServer: Microsoft-IIS/6.0X-Powered-By: ASP.NETMicrosoftSharePointTeamServices: 12.0.0.6318X-AspNet-Version: 2.0.50727Set-Cookie: WSS_KeepSessionAuthenticated=80; path=/Set-Cookie: MSOWebPartPage_AnonymousAccessCookie=80; expires=Wed, 23-Dec-2009 23:03:34 GMT; path=/Set-Cookie: http%3A%2F%2Fwfe%2FSearchCenter%2FDiscovery=WorkspaceSiteName=U2VhcmNo&WorkspaceSiteUrl=aHR0cDovL3dmZS9TZ
WFyY2hDZW50ZXI=&WorkspaceSiteTime=MjAwOS0xMi0yM1QyMjozMzozNQ==; expires=Fri, 22-Jan-2010 22:33:35 GMT; path=/_vti_bin/Discovery.asmxDate: Wed, 23 Dec 2009 22:33:34 GMT

Stage 2:

The query Object Model then in turn calls the Query Processor. The role of the Query Processor is to join results from the Full-Text Index with the SearchDB.

On SQL the two databases that are queried are:

· SSP database.

· SSP Search database.

On the SSP database proc_MSS_GetKeywordInformation is run to get the keywords for displaying on the site if Office SharePoint Search is being used.

On the search database the proc_MSSGetMultipleResults is run to get the properties from MSSDocProps table. Both these stored procedures and more details about them can be viewed via a SQL trace.

Stage 3:

The Query processor then opens a Query Pipe to the query machine. The link between the Web Front End and Query server is a query pipe only provided by MSSearch.exe. When the Query pipe network traffic is examined, it is revealed to be SMB traffic. Network Monitor 3 will identify this traffic as being the CIS protocol

A high level overview of what we can see via a Network Monitor capture is below.

The initial communication from the WFE to the Query Server occurs via one of the random (1024 to ~65000) WFE ports. This is done to Query ports 139 and 445 and finally establishing the connection through SMB.

In the network capture you will also notice a 'Tree Connect AndX Request' for path \\QueryServer\IPC$ from WFE's random port to the Query machine's port 445. Port 139 is used as the end point mapper.

Once the Tree is created, the path is changed to \OSearch in order to be able to query the flat files inside the Query server's file system.

Note however that at this point the default 'searchindexpropagation' share which is used while propagating index from index server to the query server is not used to retrieve the query responses.

Stage 4:

The indexer plug-in (refer diagram) on the query machine retrieves the results from the index. The Indexer Plug-in is the only part of Query that will access the Full-Text Index. We will get into more details about this in the following blogs of the same series. .

Stage 5:

Now let’s get into the part where the results are returned. The results come back from the Full-Text Index as DocIds. This task occurs based on the data that the web front end receives back from the Query and the SQL servers.

Stage 6:

The end result is then the function of the query processor (refer diagram above). The Query processor joins results from the Search DB and full text index. It takes the DocIds and does the join to the SearchDB to access the document properties (Title, Display URL, doc format, size, etc). This task occurs based on the data that the web front end receives back from the Query and the SQL servers.

How does Index/Query Works related article":

https://technet.microsoft.com/en-us/magazine/2007.01.search.aspx

How does MOSS Search/Index works?

Additional resources