Alexa – Beware the Satanic Statistics?


Peter Norvig over at Google has published a quick little study indicating how unreliable the Alexa Metrics are if you want to know about website traffic. Thank you Matt for pointing out this Peter paper, which is very intriguing as it demonstrates that Alexa is off by a factor of 50x (ie an error of five thousand percent!) when comparing Matt Cutts’ and Peter’s site traffic.

I’ve realized the problems with Alexa for some time based on Alexa comparisons to sites where I knew the real traffic, but 50x is a rather spectacular level of error. So great in fact, given that these sites are both read mostly by technology sector folks, it suggests that Alexa is effectively worthless as a comparison tool unless there is abundant other data to support the comparison, in which case you don’t need Alexa anyway.

Of course the very expensive statistics services don’t fare all that well either. A recent, larger, and simply superb comparison study by Rand over at SEOMOZ collected data from several prominent sites in technology, including Matt Cutts’ blog, and concluded that no metrics were reasonably in line with the actual log files. Rand notes that he examined only about 25 blogs so the sample was somewhat small and targeted, but he concludes:

Based on the evidence we’ve gathered here, it’s safe to say that no external metric, traffic prediction service or ranking system available on the web today provides any accuracy when compared with real numbers.

It’s interesting how problematic it’s been to accurately compare what is arguably the most important aspect of internet traffic – simple site visits and pageviews. Hopefully as data becomes more widely circulated and more studies like these are done we may be able to create some tools that allow quick comparisons. Google Analytics is coming into widespread use but Rand told me at a conference that even that “internal metrics” tool seemed to have several problems when compared with log files. My experience with Analytics has been superficial but seems to line up with my log stats well.