In the news, The TechNet site has some new updates. In this week’s feature, we’ll continue the introduction to SQL Server Features and talk about Full Text Indexing. The web link is on Storage Performance and the tip this week deals with a place to find free tutorials on SQL Server.
In the News:
The TechNet and Microsoft Developer Network sites have had a complete overhaul, and both now have pointers to the SQL Server “Virtual Labs” where you can try out various tutorials and features for SQL Server 2005 without installing anything on your system. You can find the MSDN version at http://tinyurl.com/6ky6ov.
In a previous Podcast I started an overview of the major features you can find in SQL Server 2005. I’ll cover each feature in more depth in future Podcasts, but we need a place to start the discussion for those features, so we’ll cover the basics first. This week we’ll take a look at Full Text Indexing and SQL Server.
When you search text data in a database, you’re probably used to using exact matches, or perhaps using a wildcard (like the percentage sign) with a LIKE predicate on the SELECT statement. But these searches use only the arrangement of characters in the string or text field you’re searching. SQL Server includes a feature called Full-Text Indexing that can search for word patterns, specific phrases, thesaurus searches that find similar words, linguistic forms that can return “singing” when you search for “sing” and even accent-insensitive searches that ignore accent marks.
But SQL Server doesn’t turn on the Full-Text Search feature in a database automatically. You have to set up not only the fact that the database is enabled for feature, but also what fields need to be included, and how often the indexes that provide the search are updated. I’ll describe that process briefly in a moment. I’ll provide a link at the end of this Podcast where you can learn more specific instructions for creating and maintaining these indexes.
SQL Server doesn’t manage the Full-Text Indexes alone. The components for Full-Text Indexing include the following:
The first component is SQL Server. It contains the database tables that you want to index. It also includes the “Key map”, which contains the mapping between the data values in the full-text key columns and internal documents. The SQL Server query processor contains the Microsoft Full-Text Gatherer, which is responsible for scheduling and driving the population of full-text indexes, and also for monitoring full-text catalogs. SQL Server also has some internal tables that it uses to manage the Full-Text indexing process.
The second component is the Microsoft Full-Text Engine for SQL Server (MSFTESQL) Process, which populates and manages full-text catalogs, controls the Filter Daemon (more on that in a moment), it builds the inverted index structure, houses the thesaurus files that contain synonyms of search terms. It also houses the “Noise Word Files”, which contain a list of frequently occurring words that are not useful for the search, such as “the”, “is”, “an” and so on. This component also controls the Word breakers and stemmers, which perform linguistic analysis on all full-text indexed data. A word breaker is the component that determines where word boundaries exist in a stream of text in the row being full-text indexed. A stemmer extracts the root form of a given word. For example, "running", "ran", and "runner" are various forms of the word "run".
The third component is the Microsoft Full-Text Engine Filter Daemon (MSFTEFD) Process. It is started by the MSFTESQL service during the full-text indexing process. The MSFTEFD hosts the components that are responsible for accessing and filtering data from tables, word breaking, and stemming.
When the Full-Text Indexing system accesses data, it uses a COM feature called an iFilter. There are iFilters for various data types. The most common iFilter use is with the CHAR, NCHAR, VARCHAR, NVARCHAR, TEXT, and NTEXT data types the indexing engine applies the “text” iFilter. For the XML data type the indexing engine applies the XML iFilter.
But you can also store and index binary data, like Word Documents and so on. For the IMAGE and VARBINARY data type, the indexing engine applies the iFilter that corresponds to the document extension, such as doc for Word documents and so on.
I’ve explained the basic architecture of the Full-Text Indexing system, so let’s take a quick look at the process for enabling and using it.
First, you enable the feature on a specific database. Next, ensure that the table you want to index has a unique, not null column (in other words, a primary key). Create a full text catalog in which to store full text indexes for that table, and then create a full text index on the column with one of the data types I mentioned in the iFilters section.
To access the data, you can use the CONTAINS predicate in the WHERE clause of a SELECT statement, the FREETEXT predicate in the WHERE clause of a SELECT statement, the CONTAINSTABLE predicate in the FROM clause of a SELECT statement, and the FREETEXTTABLE predicate in the FROM clause of a SELECT statement.
That’s quite a bit to remember – to read more about the process, check out this link: http://tinyurl.com/5ujft2
The web link for this week is from Joe Chang and it deals with Storage and Performance in SQL Server. Find it at http://tinyurl.com/5tt3uo.
Tip of the Week:
Did you know that SQL Server Books Online is filled with free tutorials? You can follow step-by-step instructions for everything from the Database Engine to Analysis Services. You can find those online at http://tinyurl.com/5p63kt.