An interesting thread got going in my email the other day, and I thought it might be of help to some of you. It started as a query to “Working Programmer” author Ted Neward. A reader had a question related to his series of articles on MongoDB.
Imran Saeed asks Ted:
“I liked your article on MongoDB and very good coverage of the topic. Is it possible to cover some “free text” search capabilities in the database in some future article on this topic? My question is probably more basic than free text. A relational database is good for relating data but it would be nice to get an idea of doing something similar in NoSQL world but with practical application. As an example, consider a blog site which has blogs/articles by user name so how would I go about searching for all the articles written by (let’s say) “Neward” in this system?”
In fact, if this is something you would want to support on a regular basis, then you might even have *two* article body fields, one to hold the “readable” text, and the other to hold the text minus all the filler words that aren’t really searchable; for example, this last paragraph could probably be reduced to just a half-dozen words by cutting out all the conjunctions and verbs, which would speed up your keyword queries by quite a lot.”
Ted then brings in an expert in the technology, who weighs in:
“For full text indexing most people use an integration with sphinx or solr. Those give you much better text search capabilities than anything you’ll get with the query language. Having said that, it is possible to get a lightweight text search by adding a field to your document that contains an array of key words/searchable terms. Then you would just query on that array. I believe that would be done with the $in operator. You don’t get some of the benefits that a proper full text indexing engine will give you such as stemming (i.e. knowing that “know” is the stem of “knowing”), but it is lightweight and easy to do. The MongoDB guys have an eventual plan to add full text indexing to the database but it isn’t a pressing feature just yet.”
— Keith Ward, editor in chief, MSDN Magazine.