SharePoint 2013/2016 Search: When standard architecture won't work for you

Customers often ask, "Will this search architecture work for me?"  What then follows is a screen grab from a TechNet search architecture article that provides examples of architectures for small, medium, large, and extra-large search farms.  The answer to their question is always, "It depends."  What it depends on is whether you have "standard" or "non-standard" requirements.

The example architectures provided in TechNet will work for most people, but some requirements or conditions may demand modifications to the examples TechNet provides.  The following question invariably arises: "What are non-standard search architecture requirements?"

These are examples of non-standard requirements:

  • Above-average DPS
  • Above-average QPS
  • Above-average number of start addresses or having HNSC enabled
  • Bare-minimum hardware or over-subscribed hosts
  • Above-average indexed item size
  • Large Search Schema (Site Columns)

 

Above-average DPS

If you have requirements for an above-average Documents Per Second (DPS) or content-processing rate, you may need to add resources or dedicate servers to Content Processing Components (CPS).  CPCs need a lot of memory and CPU to push items through their processing pipelines.

Above-average QPS

If you have requirements for an above-average Queries Per Second (QPS) rate, you may need to add resources or dedicate servers to Query Processing and Index Components (QPC/IC).  QPCs and ICs both need physical cores to maintain QPS.  If other search components (or system processes) are competing for those cores, resource scarcity may cause QPS to be lower or latency to be higher.

Above-average number of start addresses or having HNSC enabled

If you have a high number of start addresses or have Host-Name Site Collections (HNSC) enabled, the crawler will spawn more threads to crawl each unique host name it encounters.  Crawler threads run at a higher priority than other search components' threads.  When lots of unique host names are being crawled, the number of threads the crawler uses may cause other search components to become resource starved.  Please refer to Brian Pendergrass's SharePoint Strategery blogs for all things crawl (and many other wonderful search nuggets).

Bare-minimum hardware or over-subscribed hosts

CPCs, QPCs and ICs all rely heavily on CPUs and their cores.  If you are running multiple components on servers with the minimum number of CPUs/cores, in heavy-load scenarios (high DPS or QPS), your search components will be competing with each other for resources and may suffer performance degradation.  ICs require server-grade hard drives that can support the IC's IOP requirements.  Host server hardware must adequately support VM demands.  In general, you should treat search components, especially indexers, like SQL servers.  Here is an overview of virtualization for SharePoint.

Above-average indexed item size

Standard guidance for an IC hard drive is 500 GB.  For SP 2013, guidance is based on an average indexed item size of 20KB (Not the original document size--the size of the indexed item on disk after processing).   For 2013, as you approach the max items per index partition (10 M), if your average is above 20KB, you risk running out of disk space when the indexer attempts to perform a master merge.  For 2016, running out of disk space is much less of a concern because the indexers have greatly reduced their disk footprint and master merge efficiency.

Large Search Schema (Site Columns) A Site Column will auto-generate Managed Properties (MP) in the Search Schema.  Whether auto-generated or through customization, additional MPs impact the size of the Search Schema, which in turn impact an IC's index footprint on disk and memory.  How much will depend on which properties are selected on the MP (e.g. Searchable, Queryable, Refinable, Sortable).  With each MP, the number of mappings that occur during content processing and the size of the index will increase, thus increasing the computational load on CPCs.  If you require (or will generate) large schemas, you may need to dedicate additional resources to CPCs and ICs.

 

If you ever have questions on architecture requirements that may push your search system into the "non-standard" category, please contact your TAM and reach out to us at SearchEngineers@microsoft.com.