If you've ever worked with a client that has tens of thousands of users and millions of files to index, putting your neck on the line and recommending a SharePoint 2010 infrastructure that will scale to meet their requirements can at first be daunting. However, with some useful articles on TechNet and tools such as The HP Sizer for SharePoint 2010 it's possible to make a good start. Here are some useful pointers to get you started with Enterprise Search in SharePoint Server 2010 (for articles on SharePoint Foundation or FAST, please see here):
- SERVICE APPLICATIONS - There are fundamental changes in the way services are designed in comparison to SharePoint Server 2007. As noted in a previous article, SSP's are long gone and instead Service Applications are here. As a result, the way you design the Application tier may differ significantly. As an example, if your client has lots of content to index, you may provision multiple servers with the Search Service Application and crawl components (Indexing), something that wasn't possible in SharePoint 2007 (you could only have one Indexer). There are some great articles on SharePoint 2010 topologies on TechNet that have only recently been published. Lots of diagrams are also available here. For those interested in Search specifically, see this article
- INDEX PARTITIONS & QUERY PROCESSOR- The way in which SharePoint allows users to Query for content has also changed, for the better. In short, query times are improved due to the way in which the query processor works. In addition, Index Partitions; subsets of the index that are placed on Query servers, also help with this. The net result is that when using tools such as the HP Sizer, you'll notice that less Web/Query servers are recommended compared to SharePoint 2007.
- INDEX PARTITIONS & NUMBER OF SERVERS - One useful tip when planning Index Partitions is to think about the amount of items SharePoint will index and the impact on your infrastructure. This is best explained using an example. Let's say your client wants to index 50 million items with an average file size of 0.2MB. So, knowing that an Index Partition can contain up to 10 million items, rather than having 5 "active partitions" with each partition having 10 million items, it would be better to have ~6 active partitions with 8.33 million items. This allows for growth (i.e. indexing more content).
The next question is how many Query servers are needed to support these partitions? Well, this ultimately depends on your availability requirements and budget. On the assumption we want some kind of availability we'll assume we need 6 "active" partitions and 6 "mirrored" partitions (total of 12). The simplest solution would be to have 6 Web/Query servers, each with an "Active" partition and a mirrored partition. This means if one Query server dies, another Query server is used to to return results to users. However, one could decide to reduce the number of servers (and possibly cost) by placing three partitions on four Query servers (still totalling 12 partitions). In this model you'd need to make sure your servers are beefy enough to handle the increased load, a general rule of thumb is to ensure you have 2 cores of CPU per partition. Therefore, in this example you'd need 6 cores + buffer for OS etc. There are of course pros and cons of each approach (less servers is arguably more risky). The diagram below compares these options:
- STORAGE - If you're going to have mirrored index partitions make sure you calculate the storage appropriately. Using the above example, 50 million items * 0.2MB = 10TB. However, as you're mirroring this, it's actually 20TB split across the Query servers.
Search of course is just one key element to consider when sizing a SharePoint 2010 infrastructure. Other key elements include number of users, what type of activity they will perform, what kind of availability you want at the SQL level etc. However, I hope the above is useful.
This article was authored by:
SharePoint Architecture Consultant
Microsoft Consulting Services UK