Google has a well known feature that tries to suggest a different phrase if the one you entered is ‘a bit off’ (a.k.a ‘did you mean’). This takes into consideration spelling as well as query results for similar phrases and who knows what else. Finding that similarity (or differences) between what you typed in and what you wanted to type in can be quite easily for a human, but for a computer, it’s a difficult task to handle. I have no idea how they implemented this at google (who knows, maybe I’ll find out at some point) and how much time they spent on it but it’s proven to be a good investment.
Other companies picked up on that. One of the most awesome features of all time can be found in ReSharper which allows you to search through types by spelling subsequences instead of substrings. Another example is ravendb which detects misspelled words - they call it ‘suggests’. Visual Studio 2012 has a new ‘Quick Launch’ text box that allows for quick navigation through search.
ReSharper was the one that inspired me to investigate the topic a bit more and that investigation ended up with a desire to implement a smart search in the project I’m currently working on.
From a user perspective, the requirements were simple. I’d like to have a compact navigation box that allows the user to type in a word and have a list of most suitable results displayed. Each result item would have a ‘title’ and some additional info and would give the possibility to navigate to a page with item details (in my case the edit page) – hence the SearchIndexItem structure:
1: public struct SearchIndexItem
3: public string Key;
4: public string Url;
5: public Type Type;
As for the search algorithm itself, the term fuzzy search is the leading idea here.
After some research, I found out that Sql Server Integration Services (SSIS) has fuzzy lookup transformations, but I don’t use SSIS so that option is out.
Then there’s full text search but making it smart requires external libraries like this one and unfortunately I couldn’t use these.
Finally there’s entity framework which is used as an ORM, which is ok, but not really helpful for tasks like this.
So the option I was left with is to write the thing on my own and that’s what I’m going to guide you through. It’s going to take more than one post though, so stay tuned.