Getting the LRSAMPLE custom word-breaker to work on 64-bit SQL Server 2008

In this
post I will cover how to get a custom word-breaker working on SQL Server 2008
64-bit instances. Many of you out-there might already be familiar with an old
sample that shipped with Windows Platform SDK called LRSAMPLE.
I am going to use this as a sample reference and must warn you this post is not
a how-to on going about coding a custom word-breaker.

Let’s start off with the assumption that you have a requirement for a custom
word-breaker, since the kind of word-breaking that you require is not possible
with the default Word-breaker DLL that ships with SQL Server 2008.

To give you
an idea of what I mean by “custom requirements”, have a look at this scenario.

Sample Rows
Brick 4 Sided, Other (See Remarks)
Stucco Unspecified, Other (See Remarks)
Aluminum/Vinyl, Brick/Frame, Concrete Siding
Brick 4 Sided, Brick/Frame, Concrete Siding

Let’s say you want to differentiate between “Brick” and
Brick/Frame” in such a way that  a
search for the keyword “Brick” return results for only entries which contain Brick alone and not ones that contain Brick/Frame. This of course is not possible
using LANGWRBK.dll (SQL 2005) or NaturalLanguage6.dll (SQL 2008).

Okay, now you followed MSDN and compiled LRSAMPLE, got the
DLL, registered it following the instructions in https://msdn.microsoft.com/en-us/library/ms345188.aspx
, but when you actually tried to use the new word-breaker in a CONTAINS query,
you are faced with this error

Msg 30053, Level 16, State 102, Line 1
Word breaking timed out for the full-text query
string. This can happen if the wordbreaker took a long time to process the
full-text query string, or if a large number of queries are running on the
server. Try running the query again under a lighter load.

The LRSAMPLE is a C++ COM based DLL (I did warn you it was
an old sampleJ)
and as part of its registration it puts in the CLSID under HKLM\System\currentcontrolset\control\contentindex\languageEven after following the instructions to register it to the
letter, it does not work! Even a sample query to test the word-breaker using sys.dm_fts_parser
gives the same error.

 SELECT * FROM sys.dm_fts_parser (' "Hello" ', 19465, 0, 0)
-- 19465 is the language LCID is registered it as

Digging a little further in the SQLFT logs to see the reason
for the word-breaking failure, I saw this error.

2010-10-20 14:19:32.59 spid23s Error '0x8007007f' occurred during full-text index population
for table or indexed view '[GAMLS_NEW].[dbo].[AllClassesCommonData]' (table or
indexed view ID '1182067397', database ID '10'), full-text key value '578004'.
Attempt will be made to reindex it.

0x8007007f  ---> The
specified procedure could not be found.

Okay, this clears points to the DLL having some problems. I
tried to register this DLL using regsvr32 and got these errors.
regsvr32
"C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn\lrsample.dll"

DLLEntryPoint could not be found.
Specified procedure could not be found

One thing I should point out is that the sample is based on
VC6 and by default builds a DEBUG version of the DLL. So if your SQL Server is
on the same machine as your Visual Studio, it might actually work. The moment
you move this to a machine that does not have VS you will get these errors.

Here are the changes required to be done for importing this
sample to a VS project and getting it to work on x64 architecture :-
(Note: This was done on VS 2010 on Windows 7)

 

1.       Create
an empty VS2010 project and pull in all the files in LRSAMPLE folder.

2.       Change
the configuration to Release and platform to x64

 

3.
The
following code change has to be made in lrsample.cxx
to specify the path for DLL registration. (The registration code is hardcoded).

 

 SLangRegistry const English_Sample_LangRes =
{
    L"English_Sample", MAKELANGID( LANG_ENGLISH, SUBLANG_ENGLISH_SAMPLE ),
    { L"{d225281a-7ca9-4a46-ae7d-c63a9d4815d4}",
      L"English_Sample Word Breaker",
      //L"lrsample.dll",
     L"C:\\Program Files\\Microsoft SQL Server\\MSSQL10.SQLSERVER2008\\MSSQL\\Binn\\lrsample.dll",      -->  Change required here to specify location where DLL will be place
      L"both" },
    { L"{0a275611-aa4d-4b39-8290-4baf77703f55}",
      L"English_Sample Stemmer",
      //L"lrsample.dll",
     L"C:\\Program Files\\Microsoft SQL Server\\MSSQL10.SQLSERVER2008\\MSSQL\\Binn\\lrsample.dll",      -->  Change required here to specify location where DLL will be place
      L"both" }
};

You need to specify the path of your
SQL Server Instance Binn directory.

4.       Make
the following changes in the VS Project Properties.

   a.    General:  Set the configuration type to Dynamic Library (.dll)

   b.    Linker\Input:
Set module definition file to .\lrsample.def

 

c.       Linker\System:
Set the subsystem to WINDOWS

5.       Build
the Solution and copy the DLL from the Release folder to your SQL Binn
directory.

6.       Register
the DLL from here using regsvr32 from command prompt.

Regsvr32 “C:\Program
Files\Microsoft SQL Server\MSSQL10.SQLSERVER2008\MSSQL\Binn\lrsample.dll”

After
we did the above steps, the regular procedure of using custom word-breaker from https://msdn.microsoft.com/en-us/library/ms345188.aspx
was enough to get this working on SQL Server
2008 X64.

 EXEC sp_fulltext_service @action='load_os_resources', @value=1;
exec sp_fulltext_service 'verify_signature' , 0;
exec sp_fulltext_service 'update_languages';
exec sp_fulltext_service 'restart_all_fdhosts'

 

To confirm if the word-breaker was loaded you can check this DMV select * from sys.fulltext_languages

To test the word-breaker we used this query and viola, no more errors!

 

SELECT * FROM sys.dm_fts_parser (' "Breaking Word" ', 19465, 0, 0)

I must point out one important thing here if you are going
to use this DLL on other machines. You will need to have the Visual Studio Redistributable
package installed on the target machine where you plan to deploy this. You will
need the corresponding redist package for the VS version you used to build the
DLL.

If you want to use this on multiple instances on the same
machine you still need to have it copied over to each instance Binn folder and
do the registration steps as per above article to create the registry keys.

 

Some References

Word Breaker and Stemmer Sample
https://msdn.microsoft.com/en-us/library/ms693176(v=VS.85).aspx

How to: Load Licensed
Third-Party Word Breakers

https://msdn.microsoft.com/en-us/library/ms345188.aspx

IStemmer'ed the tide
(or, Language-specific processing #2)

https://blogs.msdn.com/b/michkap/archive/2005/03/13/394822.aspx

You toucha my
letters, IWordBreaker you face (or, Language-specific processing, #3)

https://blogs.msdn.com/b/michkap/archive/0001/01/01/395199.aspx