Over the last weeks there was quite some quality programming time for me. First of all I built some indices on the typology data base in which way I was able to increase the retrieval speed of typology by a factor of over 1000 which is something that rarely happens in computer science. I will blog about this soon. But heaving those techniques at hand I also used them to built a better auto completion for the search function of my online social network metalcon.de.
The search functionality is not deployed to the real site yet. But on the demo page you can find a demo showing how the completion is helping you typing. Right now the network requests are faster than google search (which I admit it is quite easy if you only have to handle a request a second and also have a much smaller concept space). Still I was amazed by the ease and beauty of the program and the fact that the suggestions for autocompletion are actually more accurate than our current data base search. So feel free to have a look at the demo:
http://134.93.129.135:8080/wiki.html
Right now it consists of about 150 thousand concepts which come from 4 different data sources (Metal Bands, Metal records, Tracks and Germen venues for Heavy metal) I am pretty sure that increasing the size of the concept space by 2 orders of magnitude should not be a problem. And if everything works out fine I will be able to test this hypothesis on my joint project related work which will have a data base with at least 1 mio. concepts that need to be autocompleted.
Even though everyting I used but the ContextListener and my small but effective caching strategy can be found at http://developer-resource.blogspot.de/2008/07/google-web-toolkit-suggest-box-rpc.html and the data structure (suggest tree) is open source and can be found at http://sourceforge.net/projects/suggesttree/ I am planning to produce a series of screencasts and release the source code of my implementation together with some test data over the next weeks in order to spread the knowledge of how to built strong auto completion engines. The planned structure of these articles will be:
part 1: introduction of which parts exist and where to find them
- Set up a gwt project
- Erease all files that are not required
- Create a basic Design
part 2: AutoComplete via RPC
- Neccesary client side Stuff
- Integration of SuggestBox and Suggest Oracle
- Setting up the Remote procedure call
part 3: A basic AutoComplete Server
- show how to fill data with it and where to include it in the autocomplete
- disclaimer! of not a good solution yet
- Always the same suggestions
part 4: AutoComplete Pulling suggestions from a data base
- inlcuding a data base
- locking the data base for every auto complete http request
- show how this is a poor design
- demonstrate low response times speed
part 5: Introducing the context Listener
- introducing a context listener.
- demonstrate lacks in speed with every network request
part 6: Introducing a fast Index (Suggest Tree)
- inlcude the suggest tree
- demonstrate increased speed
part 7: Introducing client side caching and formatting
- introducing caching
- demonstrate no network traffic for cached completions
not covered topics: (but for some points happy for hints)
- on user login: create personalized suggest tree save in some context data structure
- merging from personalized AND gobal index (google will only display 2 or 3 personalized results)
- index compression
- schedualing / caching / precalculation of index
- not prefix retrieval (merging?)
- css of retrieval box
- parallel architectures for searching