Improvements to Chinese language search in SharePoint

There is a longstanding issue with Chinese language search in SharePoint where users got strange results and the summary text highlighting was wrong. Chinese is hard to do word segmentation on (e.g. there is no punctuation character for word separation like the space character in English) and there are issues with neologisms introducing ambiguity. As…

On-Premises SharePoint careers

I have taken a job as the architect for the new on-premises SharePoint team in Suzhou, China. We are in the process of hiring a large number of engineers. It is a great opportunity for developers and leaders with great distributed systems or database skills. As well as the impact that comes from working on…

Bulk loading data with IDataReader and SqlBulkCopy

Introduction Often large amounts of data need to be quickly loaded into a database. A common approach is to fill a DataTable and use the SqlBulkCopy class to load the data. The problem with this approach is that the data set must be materialized in memory. This is inefficient because the data must be copied…

Finding and stopping rogue SQL traces

Introduction A common cause of mysterious performance issues are traces that have been left running. Naïve use of traces can leave many traces running—traces that are slowing down the application by consuming critical resources. This happens because the SQL Server Performance Analyzer process is killed or trace sessions forgotten. In analyzing an issue, many traces…


SQL Server performance investigation

Introduction I frequently help teams inside and outside of Microsoft investigate SQL Server database performance issues. As well, I monitor some large internal SharePoint databases to understand their performance characteristics and to make sure we are improving our overall performance. Here are some of the queries I use to investigate performance issues and monitor performance….


Implementing uniqueness constraints on large columns

SQL Server uniqueness constraints create an underlying unique index. SQL Server index keys may not be more than 900 bytes. Below I discuss how to implement uniqueness constraints with hash indexes when the key size can exceed 900 bytes and give the results of some tests on the relative performance of the hash index approach….

SQL Server Modeling Services announcement

The code name “Oslo” repository now has the official name “SQL Server Modeling Services”. SQL Server Modeling Services will be a SQL Server workload like SQL Server Reporting Services. Details will be announced at the PDC where the most relevant session is You can also read about it online at In Microsoft, like…

Paging SQL Server result sets

Paging through result sets is an approach for reducing the network and client resources used to display large result sets. Essentially, the approach is to load only a page (e.g. 100 rows) of data at a time. It is likely that a user will only want to see a page of data. If they want…

"Oslo" repository lifecycle/versioning whitepaper

Repositories face issues to do with versioning schema and data as well as integrating with an organization’s lifecycle processes. Here is a whitepaper I wrote on how the “Oslo” team thinks about handling various lifecycle/versioning issues The whitepaper deals with the following issues: Application lifecycle management (ALM): Facilitating team development of software in a continuous cycle…


How to make a copy of the repository

Here is a nice article on how to make a copy of the “Oslo” repository so you can later restore it to its previous state This is great for experimenting with the repository and for edit debug cycles with domains. The steps are essentially what we do internally at Microsoft.