Aligned partitioned parallel index build
In case of parallel build we scan and sort partitions in parallel and the actual number of sort tables existing at the same time will depends on the actual number of concurrent workers. Partitions are being chosen by workers one by one and when one worker completes with one partition it takes the next partition which is not yet taken by another worker. Each worker builds 0 - N partitions (we do not share one partition among multiple workers). Why can it be 0? If DOP > # of partitions, then we do not have enough partition to give out to all the workers(s). Which partition a worker is going to work on? This is non-deterministic per execution. First come first serve.
As we never share one partition among several workers, the biggest partition becomes a bottleneck. A situation could happen when all workers completed with their partitions and one worker still sorting the biggest one along. Which also means the resource being used by this query (such as memory and threads) will not be available for other queries.
There is no final stitch among workers for partitioned index. Each partition is being represented as a separated b-tree in storage engine.
How does it affect disc space requirements:
- In case of sorting in user’s database (default setting) the requirements are actually the same as in case of serial build as we are sorting in each filegroup for each corresponding partition we will need 2.2*(Size of partition) for each filegroup.
- In case of using Sort_in_tempdb (SORT_IN_TEMPDB = ON) we won’t have the same advantage as in case of serial build because we may have several sort tables at the same time and as long as we don’t know the actual distribution of data between the partitions we will still require the same 2.2*(Size of the whole index) of free space in tempdb.
We will have several sort tables at the same time (depends of #DOP and # of partitions) and we will need at least 40pages per each sort table to be able to start the index build operation. So, the minimum required memory will be #DOP*40pages.
Total memory = 40 * DOP + additional memory.
Note that additional memory does not change for serial or parallel plans, this is because the total number of rows we need to sort remain the same in both plans.