25 | July | 2008 | Joe Duck

CNET profiles a new paper showcasing a Microsoft effort to enhance search by looking at *user behavior* as well as the old standby standards that all the major search engines use such as links in to the page, the content of the page, titles of the page, and several others.

Google’s initial brilliance was recognizing that the link relationships on the web gave you great insight into the best websites. Google correctly noted that sites with many links to them, especially for a particular keyword, were more likely to match a users interest for that keyword. Although many factors have been included in Google ranking for years, pagerank was arguably the most important breakthrough. Initially the system tried to be an online analogy to academic citation. Google’s Larry Page reasoned that websites with more incoming links would tend to be better, and that those incoming links themselves should also be weighted according to the importance of the site from which they came.

The system started to show severe signs of wear as search marketeers as well as mom and pop businesses began to “game” the pagerank system, creating spurious incoming links from bogus sites and buying links from high rank websites.

Enter Microsoft “BrowseRank”, which will arguably be harder to game because it will monitor the behavior of millions of users, looking for relationships between sites, pages, length of time on page, and more. It’s a good idea of course but arguably it is Google that has *by far* the best data set to manage this type of approach. So even if Microsoft’s system starts to deliver results superior to Google’s one can expect Google to kick their own efforts into gear.

As with all search innovation the users shoud be the big winners. Search remains good but not great, and competition in this space will only serve to make everybody better….right?

The Google blog notes how huge the web is now, with Google indexing over a trillion unique URLs. As they note in the article the actual number of indexable URLs is, in one sense, infinite. For example calendar pages will automatically appear as you scroll through many applications, continuing through the years until..the singularity and beyond. Of course Google does not index many of these “empty” URLs or even a lot of junk or redundant content, so the true number of real, unique URLs is actually well above a Trillion.

I think a fun question is this: What will the information landscape look like in, say, 20 years when we should have the ability to pour *everything* from the past and the present online? Questions might take a different form if we had access to every reference on a topic that has ever been produced. Algorithms will be used to sort through the oceans of content much as Google does now, but with far more precision and better comprehension of the whole mess.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Joe Duck

Have Blog. Will Travel. Vaccinate!

Daily Archives: July 25, 2008

Microsoft BrowseRank to compete with Google PageRank

Google: A Trillion URLs and counting