Following on from the blog that Martin Kearn posted , I wanted to expand on some of the mysteries of the communication that SharePoint uses for enterprise search. While we were putting together the material for the TechEd talk, this was by and large the most interesting communication section to work on.
Almost all administration communication within SharePoint is conducted over web services (HTTP/HTTPs traffic). By and large, Enterprise Search is the same, with the unsurprising exception of the search index propagation, and the rather surprising exception of search queries.
Note: Search in this article refers the Microsoft Office SharePoint Search service, which is distinct from the Windows SharePoint Services search service.
Administration of the search service takes place over the Search Administration web service. The service is located in an IIS Web site called “Office Server Web Services” on each server that is part of a SharePoint farm. The site holds entries for each service such as Search or Excel Services, for each Shared Service Provider present on the farm.
The web site is configured by default to run on port 56737, or 56738 if SSL is being used. This can be changed with the STSADM command:
stsadm -o setsspport –httpport <HTTP port number> -httpsport <HTTPS port number>
The Search administration web service is specified in the file SearchAdmin.asmx. The full path to the search admin web service is therefore (for http traffic):
The administration service provides all the methods necessary to control the Search service, such as starting content source index crawls, updating scopes, etc. The web service is available to be called by custom applications as well as by the system.
The protocols that are used during search crawling depend on the content source that is being crawled. Which protocol is used for crawling sources is handled by a Protocol Handler, an object responsible for fetching the content to be indexed. By default, SharePoint comes with protocol handlers for the following protocols (from the msdn article Plan to crawl content (Search Server 2008) (http://technet.microsoft.com/en-us/library/cc280343.aspx) :
Used to crawl
Web sites over Secure Sockets Layer (SSL)
Lotus Notes databases
Exchange public folders
Exchange public folders over SSL
People profiles from Windows SharePoint Services 2.0 server farms
People profile crawls of Windows SharePoint Services 3.0 server farms only
People profile crawls from Windows SharePoint Services 3.0 server farms only over SSL
People profile import
People profile import from Windows SharePoint Services 2.0 server farms over SSL
Windows SharePoint Services 3.0 root URLs (internal protocol)
Windows SharePoint Services 2.0 sites
Windows SharePoint Services 2.0 sites over SSL
Windows SharePoint Services 3.0 sites
Windows SharePoint Services 3.0 sites over SSL
Custom Protocol handlers can be written to fetch content from disparate sources. For more information, refer to Creating a Protocol Handler (http://msdn.microsoft.com/en-us/library/ms947581.aspx) . Each protocol handler is free to use whichever communication protocol it wishes to. For accessing external data, searching of Business Data Catalog information is in most cases the preferred solution. For more details refer to Enabling Business Data Search (http://msdn.microsoft.com/en-us/library/ms492695.aspx).
Indexing and querying both make use of the Server Message Block (SMB) protocol to transfer data.
The SMB protocol was originally invented at IBM with the intention of rendering network file access available with the same ease as local file access. Around 1990, Microsoft merged the protocol with the LanManager product, and continued to develop it as a means for sharing files and folders, printers and miscellaneous other communication.
The SMB protocol was originally intended to run over NetBIOS, but from Windows 2000 was modified to run over TCP port 445, which it currently uses. With Windows Vista, Microsoft released SMB 2.0, which has several enhancements over the original protocol.
Given that SMB was designed for file and folder sharing, it comes as no surprise that the index propagation is done over SMB, and consists of partial file copies to the search index shared folder location.
This is a shared folder created on each Query server in a SharePoint farm, and although configurable when the Search Query role is activated on a server, is usually configured as \\<servername>\searchindexpropagation. By default, this location usually shares the folder at C:\Program Files\Microsoft Office Servers\12.0\Data\Applications\<shared service provider GUID>.
Search propagation is a co-ordinated effort between the Search Service on the Index server, the Search Service on the Query server, the database, and the file system, using the SMB protocol. The following diagram, taken from the public document describing the Search Index Propagation protocol [MS-CIPROP]: Index Propagation Protocol Specification (http://msdn.microsoft.com/en-us/library/cc313077.aspx), describes the interaction.
In this diagram, the top-right block refers to the SMB propagation of index files, which takes place as a standard file share copy.
Perhaps the biggest surprise is that search queries are issued from the Web Front-End (WFE) to the Search Query server using the SMB protocol. It would seem that this is a prime candidate for a web service query, and the fact that SMB is used has implications for extranet server topology design.
For example, if you design a SharePoint infrastructure architecture where the WFE’s are located in a separate segment of the perimeter network, and the rest of the servers in the farm are located within a more secure segment of the network (a form of the Back to back perimeter topology ), the SMB protocol will need to be opened in the firewall between the two network segments.
In the above diagram, Router A will need SQL Server ports and SMB ports to be allowed through. This means essentially that file-share access is enabled through Router A.
So why would the Search service use SMB?
The answer is performance – it turns out that SMB is used as the transport-level protocol for the Named Pipes . Named Pipes is a Microsoft Inter-Process Communication (IPC) mechanism which is binary and fast. For a long time it was the de facto communication mechanism in and across Windows Servers. For a long time it was the default communication mechanism for SQL Server, and is still available as a protocol for the server product. By using SMB as the transport layer, Microsoft provided Named Pipes as an IPC that was fast and efficient.
Perhaps some clue can be gathered in the Win32 API – to open an IO device, the CreateFile method is called. This API call is responsible for opening files, directories, physical volumes – as well as IO devices such as tape writers, parallel ports, and pipes.
Inter-server communication is something which almost always turns out to be slightly more complex than it first seems, and this is absolutely the case for Enterprise Search within MOSS. Enterprise search involves several processes and communication mechanisms. This has impact on all aspects of server farm design and maintenance, and is crucial to understand when troubleshooting search problems.
The first port of call to understand all of this should be the SharePoint Back-end protocol documents (http://msdn.microsoft.com/en-us/library/cc339473.aspx), which detail each of the processes and interactions, as well as communication mechanisms.
Microsoft Consulting Services UK
Click here to see my bio