Dynamic fuzzy search with LINQ and entity framework and jQuery autocomplete part 2

By eidias on (tags: fuzzy search, categories: code)

Search has always been my most desired feature in applications and if I were to guess, I’m not the only one with this craving. The comfort and flexibility it gives you is enormous under one condition – it’s smart enough.

It’s time for part 2

In the first part I gave an introduction on why I wanted to implement a custom search. Now, I’m going to show you how I did that.

As a starting point I needed to figure out how to do fuzzy search and what type of data it can produce.
I did that when I first stumbled on the subject and it turned out to be a lot of fun. That resulted in a small lib that you can look up here.

The algorithm I decided to utilize is the Longest Common Subsequence (LCS) length which is used to rank entries. These entries are later sorted by that rank and the top N are displayed. There is one caveat here – the LCS length is O(n*m) where n is the length of the term typed in by the user and m is the length of the word that’s being ranked. I optimized the memory footprint compared to the original algorithm but the number of operations didn’t change – so this will be a bottle neck at some point because this algorithm needs to be applied to every indexed string… which makes it… ugh… O(n3) – but for now, I’ll let it slide.

The second step was to create a search index which seemed to be an easy task – for each entity you want to have searchable run a db query and return the results – but that would mean I’d have to do very similar thing multiple times and I wanted to keep the index creation as flexible and compact as possible (obviously). So the idea – attributes. Here’s how I wanted to have it:

   1: public class Foo
   2: {
   3:     [Searchable]
   4:     public string Bar {get;set;}
   5:     [Searchable]
   6:     public string Buzz {get;set;}
   7:     public Frob Frob {get;set;}
   8:     public byte[] Bytes {get;set;}
   9:     // ...
  10: }
  11:  
  12: public class Frob
  13: {
  14:     [Searchable]
  15:     public string Qux {get;set;}
  16:     public string Bleach {get;set;}
  17:     // ...
  18: }

That way I could choose which property should be searchable and which not (note that I didn’t put the attribute on a class).

The searchable classes are part of the domain model and part of the class that inherits from DbContext (for non entity framework users – that’s the class that holds reference to all types that are stored in the database). I call that class AppContext.

At this point I can say what I want to query.

This is how I get the types that need to be queried:

   1: var assembly = Assembly.GetAssembly(typeof(IAppContext));
   2: var types = assembly.GetTypes()
   3:                     .Where(t => t.GetProperties().Any(p => p.GetCustomAttributes(typeof(SearchableAttribute), true).Any())
   4:                                 && !t.IsAbstract
   5:                                 && typeof(Entity).IsAssignableFrom(t))
   6:                     .ToList();

As you can see, there is a limitation here – each searchable entity needs to inherit from Entity which is a class that holds an ID. Similarly I can get a list of properties for each type (stored under the variable typeFrom)

   1: var propertiesToQuery = typeFrom.GetProperties().Where(p => p.GetCustomAttributes(typeof(SearchableAttribute), true).Any() || p.Name == "ID").ToArray();

So now I have all the information to run the query and we’ll go through that in the next post.

Cheers