Building a Visual Search Service for IE8 with WCF - Part I

Beta 2 of Internet Explorer has been out for a while now and as you already know one of the new functionalities of Beta 2 is the visual search feature. Since there is a really extensive introduction about visual search and the steps on the IE8 side in this article on the Internet Explorer Team Blog I do not want to go into detail about this but rather tell you how to implement a service that is consumed by the Internet Explorer 8 Visual Search feature by means of a quite comprehensive example.

I chose a scenario which might be quite popular and that is video search. However since a user must explicitly register a search provider into his Internet Explorer unless you provide a customized instance using the Internet Explorer Administration Kit (IEAK). So in order to get one of the rare slots on the users search provider list I wanted to create some kind of meta search that aggregates search results from multiple sources. Another reason was that this enabled me to include and write about topics like threading and caching as well. In order to access the entries of the relevant video assets of the video sites I used their public APIs. Unfortunately it is not yet very common for those sites to offer public APIs to access their content especially in Germany I found none except for MyVideo.de however the application for an API key still remains unanswered. However at least I found two suitable APIs which are the YouTube and the metacafe API. The architecture of the service would allow to include further video provider at a later stage through a simple interface which could even be made configurable with the service evolving.

The requirements I defined for the service are:

  • Retrieval of video asset information from multiple sources

In order to provide the most possible value to the end user the visual search should aggregate multiple sources and provide a categorized visual search XML document to Internet Explorer.

  • Sources should be accessed in a non-sequential non-blocking manner

The Visual Search functionality is very performance sensitive because its sole purpose is to provide a preview of search results or suggestions while you type. And as we are usually typing quite fast the service ideally needs to respond as fast as we type. However from my experience many people as they get used to the value of the preview take some more time to type and skim through the results. However performance is one of the crucial non functional requirements of such a service. So I split the search operations on the different video sites up into different threads which signal the main thread when they are finished and the final result message can be assembled and send back to the browser. As this already has some great impact on performance I also defined a timeout for the main thread to wait for the search workers to finish which ensures quality of service to the search users.

  • Source specific search controllers which translate the incoming feed information into the service specific data model

Since all of the APIs are different in the form how they issue data each one has to be handled separately. For the services used this means that there are two search controllers which prepare the incoming data in order to transform it into the XML structure needed by the visual search feature. The effort depends on the structure of the underlying APIs. In my example the YouTube API is a bit easier to use since they already offer .Net client libraries for easy consumption. In the case of metacafe we need to care about the network operations and the parsing of the response message from the service a little bit more.

  • Caching functionality to reduce costly network access and time-consuming processing of the incoming responses

Again this is something that is very much related to performance but also needs some careful considerations because caching one one hand greatly boosts the services ability to respond to the service requests that fire against the service as you type however it also introduces the risk of sending outdated results to the browser. Therefore the invalidation time needs to be carefully defined. Most important to cache are also probably the one or two letter searches which are rarely of relevance but would require a full roundtrip of the search service whenever a search is started being typed.

  • Robust and stable hosting environment

A service like this which could easily be used by millions of users needs to be very robust and stable. Since this is already a challenge for the service application itself this is also true for the hosting environment. As I didn't want to care to much about this for my example I chose to host the WCF Service in Internet Information Service 7 as this is a production proven high scale hosting environment for web services.

So the high level architecture of the service looks like the following:

Search Service Architecture

Technologies used for the search service are as follows:

That's all for the introductory and overview part. In the next parts we dig a bit deeper into the into the architecture and the components.

And in order to give you a little preview of what you can expect here's a screenshot of the service in action.

IE8 Visual Search Sceenshot

DeliciousBookmark this on Delicious Share