Tuesday, August 26, 2008

Click trails and search engine war

In the quest for a "perfect search engine" the solution is simple. The search engine which can perfectly predict your query before you search and provide the perfect answer will win. So the question is how do you even begin to attempt to predict what someone will query for let alone the search result you are truly looking for? I believe that data modeling will be the solution to this problem.

People are creatures of habit and we behave like a mob. Look no further than popular websites like Boing Boing, Slashdot, and Digg who have reputations for bringing web servers to their knees and you quickly see that our movements have a pattern ripe for modeling.

To generate a data model for a "perfect search engine" you need two important aspects, a "perfect formula" and "perfect information." I would like to first focus on the latter.

Imagine an world where a search engine provider could track and record your every mouse movement. We already have available complex analysis tools which takes this data on a site level and allows designers to make user interface improvements. Today most of these tools are expensive packages of analytics software, but I wouldn't be surprised if certain free analytic tools made these available to their customers. Of course part of providing these tools free will be you allow the vendor use this data for their own purposes.

Next even for those sites who choose to not partake in analytics software need some sort of revenue model. The world of advertising drives free content on the web. To enable sites of all levels to reach advertisers of all levels you need a broker to help these transactions. For only a small percentage of the money exchanged between these parties, the broker will handle all the transactions and even host the ads! Be sure to check the terms of service here as part of participating in this service as I'm sure it will include a little ditty about tracking users.

Still not enough data yet? How about providing free web services like email, blogging, web hosting, and social networking? Why not partner with other companies to provide free services or discounted advertising in trade for more user tracking?

Once you have data from the vast majority of web properties in the world you now need to look toward completing your model with a formula. This is where things start getting tricky. A generalized formula isn't going to meet our need for a "perfect model." Today the web is filled with too many demographic/sociographic/geographic groups who all use it differently. So you will need to create a model which is flexible based on the individual or at least group. Good thing you are collecting that personal information from your free web service and partner companies. Still don't have enough information to identify a person? No worries, your click trails from your analytics software and advertising will quickly allow to quickly file an individual into a group.

Even with all the data in world a model this complex will likely stump the world's data experts for some time. I think it is pretty safe to say Google has the head start in both the model and data. Luckily for Microsoft they only need enough data to keep their confidence interval high and a better model to win. Who would I hire to help make a perfect model? Some brilliant economists, statisticians, psychologists, and sociologists would certainly be a good start, but a quick search on both Google's and Microsoft's career sites don't show much interest in any of these fields. Are Google and Microsoft missing an opportunity here or am I missing something?