If you rely on SQL Server Full-Text Search, then this is important information, even if you can’t recite the definition of word breaking and stemming.
Word breakers identify word boundaries and break words into their components. Stemmers identify alternate forms of a word, such as “run,” “ran,” and “running.” Word breakers and stemmers are applied by Full-Text Search both to the text that you index at indexing time, and to the terms in your full-text queries at query time.
1. New word breakers are installed with SQL Server 2012
All the word breakers and stemmers used by Full-Text Search and Semantic Search (with the exception Korean) are updated in SQL Server 2012 “Denali.” Please note the following changes in particular:
- The third-party word breakers for English that were included with previous releases of SQL Server have been replaced with Microsoft components.
- The third-party word breakers for Danish, Polish, and Turkish that were included with previous releases of SQL Server have been replaced with Microsoft components. The new components are enabled by default.
- There are new word breakers for Czech and Greek. Previous releases of SQL Server Full-Text Search did not include support for these two languages.
- The word breaker and stemmer for the Korean language are not upgraded in this release.
For consistency between the contents of indexes and the results of queries, we recommend that you repopulate existing full-text indexes after upgrading.
The information above is included in Books Online in the topic Programmability Enhancements (Database Engine).
2. Behavior changes in the new word breakers
The new components might return different results than the older components when you populate and query full-text indexes. In some cases, the word breaking returns similar results, but in other cases the new components return more results or fewer results than the older components.
Please consult the tables in the topic, Behavior Changes to Full-Text Search, for examples of some of the differences that can be expected in English results.
3. Reverting to the previous behavior
If you have to retain the previous behavior of the word breakers and stemmers, then we describe the steps for reverting in the topic, Revert the Word Breakers Used by Search to the Previous Version.
Reverting may require you to replace files and to change registry entries manually. These steps are described in detail in the linked topic.
We hope this information helps you to understand the behavior of the new word breakers in SQL Server 2012 successfully!
Subscribe to our blog feed for ongoing updates about search in SQL Server.