Creating Custom Dictionaries for special terms to be indexed ‘as-is’ in SQL Server 2008 Full-Text Indexes


Did You Know?:  That SQL Server 2008 provides the ability to create a Custom Dictionary of special terms to be indexed ‘as-is’ (i.e: ‘AT&T’) into the FTIndex?… 


Many times you find yourself in a situation where you would like to index a given word/term ‘as-is’ in your Full-Text Index. However, due the specific tokenization behavior from a given WB, it gets splitted in several individual terms. This might cause at query time recall issues.


For instance, your data contains the following term “ISBN-7”. Internally, most (if not all) WBs will tokenize this as ‘ISBN’ and ‘7’ separately. These two terms are going to be persisted in the FTIndex instead of the single and original “ISBN-7”. Note that because ‘7’ might be a default noise word for most languages (single digit numbers typically are considered as noise words), what happens is that the FTIndex will only index the term ‘ISBN’.

Later, at query time when you search for ‘ISBN-7’ using any of our available predicates (i.e: CONTAINS) or tvfs (i.e:CONTAINSTABLE), the same process will occur and the query will be internally translated to a query for ‘ISBN’ only. You will then receive all rows/documents containing ‘ISBN’, not only the one you cared about initially: ‘ISBN-7’. 


By disabling STOPLIST for a given FTIndex, you are going to solve the missing results part of the problem, but you are going to have larger FTIndexes leading to performance implications and still not solving the problem of false positives, as you will still find documents containing ‘ISBN’ and ‘7’ somewhere in the corpus of the document, but not necessarily together (next to each other) as you wanted originally.

This is a common problem when querying for email addresses for example, where the ‘@’ character serves as separator . In this case, you are going to find document containing the exact email address as you desired, but as well documents containing both parts of the email separated for other special character different than ‘@’ , leading to false positives again. There are other different variants of this behavior that can lead to false positives as well, etc.. While this behavior is expected, as you see, it can lead to undesired results.


In order to prevent this, SQL Server 2008 new WB family is introducing the ability to list a set of words/terms that should not be tokenized by a given WB, thus forcing them to be indexed as they are. These lists (called custom dictionaries) are simple text files and associated to each language supported by SQL Server Full-Text Search.


Here are the details in how to accomplish this:




A custom dictionary is a Unicode-encoded file that can be used to specify words that you want the word breaker of the same language to consider as complete words. Custom dictionaries are not provided, by default. To modify the word breaker behavior for more than one language you must create a separate custom dictionary for each language for which you want to modify the word breaker’s behavior. You cannot create a custom dictionary for the language-neutral word breaker.

-Example created by Venkat, one of our team testers:

Consider the term “34590-97-8”. When this is passed to the German wordbreaker (for instance), the output terms are many:

select [display_term] from sys.dm_fts_parser(‘34590-97-8’, 1031,0,0)








This output can be seen using:

select * from sys.dm_fts_parser(‘34590-97-8’, 1031,0,0)

If you need to keep these numbers together for business reasons (I.e: this is a chemical code, account, phone number,  etc) you then need to create a custom dictionary for German language in this case. Here are the steps:

1.      Log on to the machine (on which sql server is installed) as a member of the Administrators group.

2.      Start Notepad and type the words you want in your custom dictionary. Be sure to avoid invalid entries as described in the Reference link previously included. Remember that each word must be on a separate line and separated by a carriage return (CR) and line feed (LF).

3.      On the File menu, click Save As.

4.      In the Save as type list, select All Files.

5.      In the Encoding list, select Unicode.

6.      In the File name box, type the file name in the following format: CustomNNNN.lex, where NNNN is the language hex code of the language for which you are creating the custom dictionary (i.e: English= 1033). See Table 1, in the document referenced before in this post for a list of valid file names for supported languages and dialects.

7.      In the Save in list, navigate to the folder that contains the word breakers. By default, this is the sql Binn directory. Example C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn.

8.      Click Save.

9.      Restart fdhost by executing “exec sp_fulltext_service ‘restart_all_fdhosts’” on the sql server instance.

Now the custom dictionary will treat the term differently as a single word:

select [display_term] from sys.dm_fts_parser(‘34590-97-8’, 1031,0,0)



 Currently, these text files only support a list of words and not more complex expressions, as for instance regular ones, etc.. (i.e: ISBN-*).

Important Note: This will only work with newly shipped SQL Server 2008 Word Breakers and not with older word breakers unchanged in our 2008 release. These new wordbreakers are also shipped with the OS in Vista+ operating systems. These wordbreaker dlls have names as NaturalLanguage6, NLS*.dll etc.  As a notorious exception, the english wordbreaker shipped with SQL Server 2008 is identical to our 2005 version, thus it does not belong to this new category of Word Breakers capable to support a custom dictionary. In order to use a custom dictionary for English in SQL Server 2008, you will need to change the registry registration and indicate that the English WB existing in you OS (only if it is Vista+) is the one that should be used for your SQL intance from that moment on.  Please follow this article in how to change this registration for a given WB or iFilter:

The above article shows how to do it for Danish, Polish etc. To do this for the new English word breaker, you need the corresponding values for the new English Word Breaker and stemmer. The wordbreaker files to copy to your sql Binn directory are below. These should be in your windows\system32 folder on Vista+ operating systems.

1.      NlsData0009.dll

2.      NlsLexicons0009.dll

3.      NlsGrammars0009.dlll

Other settings needed:

String value



DWORD value



String value



string value





We hope this information is useful to you. We are going to include this information in our official Books Online (BOL) release in upcoming SQL Server 2008 SPs as well as in our next major release.




Comments (11)

  1. valerka says:

    Thanks. It’s really helps. The were a bit problem, that NaturalLanguage6 have to leave from SQL/binn directory, but other NlsData0009.dll and NlsLexicons0009.dll have been taken from sys32 folder.

    NlsGrammars0009.dll I was unable to find, but it works without it.

  2. Vijay Das says:


    I am facing the issue in SQL Server German. Where I want to store the email addresses as-is.

    I have tried modifying the Custom Dictionary file. It works with the example you described above but not for the emails.

    Could you pleas let me know the way to specify an email address in the Custom Dictionary file for SQL German.

    Please reply to me at

  3. blpugh says:

    Does anyone know if these instructions/values are valid for windows 7 ultimate?

    system info:

    Microsoft SQL Server Management Studio 10.0.2531.0

    Microsoft Analysis Services Client Tools 10.0.1600.22

    Microsoft Data Access Components (MDAC) 6.1.7600.16385

    Microsoft MSXML 3.0 5.0 6.0

    Microsoft Internet Explorer 8.0.7600.16385

    Microsoft .NET Framework 2.0.50727.4952

    Operating System 6.1.7600

    o/s windows 7 ultimate

    I did the following, copied:

    C:windowssystem32NlsData0009.dll to C:Program FilesMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSQLBinn

    C:windowssystem32NlsLexicons0009.dll to C:Program FilesMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSQLBinn

    updated registry as follows, add 2 new CLSID's:

    {9DAA54E8-CD95-4107-8E7F-BA3F24732D95} data value NlsData0009.dll

    {61A48126-EF74-4d4a-9DDA-43FD542CAD1E} data value  NlsLexicons0009.dll

    Updated for english (enu under language)

    changed StemmerClass data value to {61A48126-EF74-4d4a-9DDA-43FD542CAD1E}

    changed WBreakerClass data value to {9DAA54E8-CD95-4107-8E7F-BA3F24732D95}

    executed exec sp_fulltext_service 'restart_all_fdhosts'

    this did not work for me, actually full text search didn't work at all

    any help would be appreciated,


  4. Kanja Saha says:

    I am facing the issue in SQL Server German. Where I want to store the AT&T as-is.

    I have tried modifying the Custom Dictionary file. It works with the example you described above but not for the AT&T and anything containing &.

    It works for AT-T. AT%T but not for AT&T

    Please let me know if I am missing something



  5. MyQuestion says:

    I have the exact scenario with the language in Chinese (Hong Kong SAR, PRC) running, I try to create the custom dictionary, what is the hex code of the CustomNNNN.lex? What is the corresponding dlls of the NlsData0009.dll, NlsLexicons0009.dll, NlsGrammars0009.dlll and the registry entry?

  6. Aaron says:

    I tried the instructions and cannot get them to work. I get a "Word breaking timed out for the full-text query string." error every time I try to use full text after making the change.

    OS: Windows Server 2008 R2 Standard

    SQL: SQL Server 2008 SP1 (10.0.2531.0)

    I made the following changes:

    * Copied the NlsData0009.dll and NlsLexicons0009.dll files from "C:WindowsSystem32" to "C:Program FilesMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSQLBinn".

    * Added Key "{9DAA54E8-CD95-4107-8E7F-BA3F24732D95}" to "HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSearchCLSID" with value "NlsData0009.dll".

    * Added Key "{61A48126-EF74-4d4a-9DDA-43FD542CAD1E}" to "HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSearchCLSID" with value "NlsLexicons0009.dll".

    * Changed value for "HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSearchLanguageenuStemmerClass" to "{61A48126-EF74-4d4a-9DDA-43FD542CAD1E}".

    * Changed value for "HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSearchLanguageenuWBreakerClass" to "{9DAA54E8-CD95-4107-8E7F-BA3F24732D95}".

    After making these changes attempts to use the sys.dm_fts_parser function fail with the error listed above.

    I tried swapping the DLL names, using "NlsData0009.dll" for both DLL names and using "NlsLexicons0009.dll" for both DLL names and the behavior did not change.

    I reverted the "HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSearchLanguageenuWBreakerClass" and "HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSearchLanguageenuStemmerClass" keys back to their original values and full-text search started working again.

    Did I miss any steps or execute any of them incorrectly?

  7. Mark says:

    Has anyone had any success with these instructions? I found them about 18 months ago and after much fiddling managed to get some simple phrases picked up by the search, from my custom dictionary.

    However I'm now trying to amend my already working dictionary to include some new phrases and cannot get it to work. The command exec sp_fulltext_service 'restart_all_fdhosts simply doesn't get the search to pick up the custom dictionary again.

    The old entries still work, the new ones are not picked up. I feel like there's something else I need to restart somewhere. We've stopped and started the SQL instance, I've even in desperation deleted my index completely and rebuilt it from scratch, but no effect.

    I have found several thread on the internet of people having similar problems but none have a resolution.

    Does anyone know the magic I'm missing?

  8. Phil says:

    Mark, did you rebuild your full-text index?

  9. Phil says:

    On SS 2012 the filename for an English dictionary looks like it needs to be Custom0009.lex. On SS 2008 it needs to be Custom1033.lex.

    Also the # symbol seems to work as a wildcard for numbers on 2008, but not on 2012.

    It would be REALLY nice if this was all documented somewhere.

  10. Phil says:

    Please disregard my comments above about the use of the # character and the filename of the custom dictionary on SQL Server 2008. Further investigation has determined that those findings are not correct.