Menu

Search

Common keywords & SQL Server full-text search

We spent some time recently helping a InstantKB customer. The customer was seeing unexpected search results within InstantKB.NET when using SQL Server full text search & a Boolean "ALL / AND" keyword search..

Unexpected Search Results

The client had an article titled "Bill me later" however when searching for the keywords "Bill me later" there was no results shown. Using a "ANY / OR" Boolean search seems to work fine.

After some investigation it appears the problem occurs due to the list of common keywords SQL Server maintains internally. 

SQL Server Noise / Stop Words

SQL Server 2005 calls them noise words, SQL Server 2008 calls them stop words. SQL Server has a list of around 150 common words, which are ignored by the Full Text Search engine. e.g. about, this, all, me, was, and, must etc. You can view the English stop words in SQL Server 2008 by executing the following query:
 
select * from sys.fulltext_system_stopwords where language_id = 1033

Because “me” is a stop word and we are performing a Boolean “AND” search we won't get any results as the query is terminated when we hit the keyword “me”. This does not seem to effect “ANY / OR” type Boolean searches so I can only guess these are executed in a different fashion.
 
Solution?

There is a workaround – to wipe the list of stopwords! The problem with doing this is that the list of stopwords is setup server-wide on the SQL Server itself.

If this is not an option I’ve attached below a script to add all common stop words to the your InstantKB.NET Ignored search terms table ensuring the stop words never get passed to the full-text query engine. This ensures the query won’t get halted half way through the boolean search returning no results.

For InstantForum users you can simply change the ApplicationID within the script attached below to 1.
 
To demonstrate this take the following full-text query…
 
SELECT (a.ArticleID)
FROM
InstantKB_Articles AS a
LEFT OUTER JOIN
ContainsTable(InstantKB_Articles, *, '"*later*" AND "*bill*"', 500) AS ftArticleTitle
ON a.ArticleID = ftArticleTitle.[Key]
WHERE
a.ArticleID IN (IsNull(ftArticleTitle.[Key],0))

This will return the article titled “Bill Me Later”. However if we add the common keyword “me” the exact same query will fail…
 
SELECT (a.ArticleID)
FROM
InstantKB_Articles AS a
LEFT OUTER JOIN
ContainsTable(InstantKB_Articles, *, '"*later*" AND “me*” AND "*bill*"', 500) AS ftArticleTitle
ON a.ArticleID = ftArticleTitle.[Key]
WHERE
a.ArticleID IN (IsNull(ftArticleTitle.[Key],0))
 

You’ll notice the second query returns no results as it’s being terminated once it hits the "me" stop word.
 
To resolve this please run the attached SQL script against your InstantKB database.
 
Once you’ve ran the attached script against your InstantKB database please recycle the web application within IIS to clear the local ASP.NET cache and ensure the new ignored search terms are being used by InstantKB.
 
If you do run into any other scenarios where you would expect to see search results it may be due to these common keywords. You can add your own terms to the list of ignored search terms used by InstantKB from within the Admin CP > Settings > Ignored Search Terms page.

I hope this blog entry helps someone :)


Attachments


AddIgnoredSearchTerms.txt AddIgnoredSearchTerms.txt (9.00 KB, 1.6K views)

On a scale of 1-5, please rate the helpfulness of this blog entry


Not Helpful
Very Helpful
Optionally provide your comments to help us improve this blog entry...

Thank you for your feedback!

Comments

Member Photo


0
JohnTefer posted 6 Years Ago
New Member with 9 recognition pointsNew Member with 9 recognition pointsNew Member with 9 recognition pointsNew Member with 9 recognition pointsNew Member with 9 recognition pointsNew Member with 9 recognition pointsNew Member with 9 recognition pointsNew Member with 9 recognition points
Would like to thanks for imparting this with us.It is really very helpful and useful too..
chemdry

Add Your Comments

Comments require login or registration.