Got Stats?

This is a cross posting of an article I wrote over at Technology Report about internet marketing:

One of the cornerstones of good internet marketing is knowing your statistics, and you’d think with all the elaborate, inexpensive and free measurement and analytical tools everybody would have a great sense of how their sites stack up to the competition.

But you’d  be wrong.

In fact even many large companies are struggling with high quality analysis even as the tools get better and the measures s-l-o-w-l-y are reaching some level of standardization.     For most small companies metrics are, literally, more misses than “hits”. Webmasters routinely report or misinterpret or misrepresent website “hits” as viable traffic when hits often are simply a measure of the number of total files downloaded from the site.    Graphics or data intensive websites can see hundreds of hits from a single web visitor.

Even when the analysis is good the reporting is often opportunistic or manipulative, and it’s often done by the same team that is accountable for the results.     This is a common problem throughout the business metrics field.  Executives are well advised to have independent auditing of results by unbiased parties for any business critical measurements.

Consider learning and using analysis packages like Google Analytics – a brilliantly robust and free tool provided by Google to anyone.

A while back Peter Norvig, one of the top search experts over at Google (also a leading world authority on Artificial Intelligence), published a little study indicating how unreliable the Alexa Metrics were with regard to website traffic.  (Thanks to Matt Cutts for pointing out the Peter paper.

The results here demonstrates that Alexa is off by a factor of 50x (ie an error of five thousand percent!) when comparing Matt Cutts’ and Peter’s site traffic.

Although this is just an anecdotal snapshot indicating the problem, and perhaps Alexa is better now, I’d also noted many problems with comparisons of Alexa to sites where I knew the real traffic.   50x seems to be a spectacular level of error for sites read mostly by technology sector folks.   It even suggests that Alexa may be a questionable comparison tool unless there is abundant other data to support the comparison, in which case you probably don’t need Alexa anyway.

Of course the very expensive statistics services don’t fare all that well either. A larger, and excellent comparison study by Rand Fishkin over at SEOMOZ collected data from several prominent sites in technology, including Matt Cutts’ blog, and concluded that no metrics were reasonably in line with the actual log files. Rand notes that he examined only about 25 blogs so the sample was somewhat small and targeted, but he concludes:

Based on the evidence we’ve gathered here, it’s safe to say that no external metric, traffic prediction service or ranking system available on the web today provides any accuracy when compared with real numbers.

It’s interesting how problematic it’s been to accurately compare what is arguably the most important aspect of internet traffic – simple site visits and pageviews. Hopefully as data becomes more widely circulated and more studies like these are done we may be able to create some tools that allow quick comparisons.  Google Analytics is coming into widespread use but Fishkin told me at a conference that even that “internal metrics” tool seemed to have several problems when compared with the log files he reviewed.  My own experience with Analytics have not been extensive but the data seems to line up with my log stats and I’d continue to recommend this excellent analytics package.

Narrow Focus

Jumping down the rabbit hole of the Climate debates is always very interesting but it’s also very frustrating to watch many brilliant (as well as stupid) and well-informed (as well as ignorant) people avoid each other because the blog environments are not civil enough to encourage quality discussion of really intriguing issues.    Great examples of the challenge of discussing science in blogs are my two favorite “watering holes” for the active discussion of climate science: and

At both, intelligent and provocative posts often lead to “supportive” commentary from the allies of the blog but also ferocious attacks on critics of the initial post.   This makes for interesting comments and reading if you can handle the emotional / intellectual heat, but I think the overall chases away the two very  important groups who participate in blogging:  the huge number of casual observers  looking for answers to complex questions and the small number of authoritative voices who study a particular complex topic.

Even as a seasoned blogger who rarely wants to back down from discussion points I find it very frustrating to bounce back and forth hoping my reasonable comments will not be moderated (a major problem at RealClimate, and not much of a problem at ClimateAudit)  and hoping that critics will be treat researchers with the basic respect they deserve  (lack of respect is a huge problem at both ClimateAudit and RealClimate, where PhD science authorities are routinely accused of incompetence (mostly at ClimateAudit) and reasonable criticisms are dismissed casually as “nonsense” simply so they do not need to be addressed properly (mostly at RealClimate).

Increasingly blogs moderate reasonable comments because they don’t fit the political agenda of the blog and I still think this is anathema to quality discussion.  Others (like Joe Duck) pretty much allow any comments that are not obscene, spam commercial, or racist so a single person can wind up dominating the conversation, chasing others away.

I’m rethinking my policies about how to manage commentst because it’s good to hear from more pe0ple.  Howevert I’m not going to be snipping or moderating anybody anytime soon.    I think Steve McKintyre of Climate Audit might have the right idea which is to push some comments to “unthreaded” if they are off the topic of the post.   This leaves free speech intact while keeping a few people from dominating the whole comment show.

Final note is that I prefer to err on the side of giving everybody their full voice and I plan to continue doing that here.

Photo and Text Credit:  NASA Hubble.

This is NOT a collage, but one of the new striking images from the Hubble Space Telescope.   These galaxies are a cluster.     Question:  Estimate how many creatures as bright or more intelligent than humans live in the area defined by this picture?


The first identified compact galaxy group, Stephan’s Quintet is featured in this stunning image from the newly upgraded Hubble Space Telescope. About 300 million light-years away, only four galaxies of the group are actually locked in a cosmic dance of repeated close encounters. The odd man out is easy to spot, though. The four interacting galaxies (NGC 7319, 7318A, 7318B, and 7317) have an overall yellowish cast and tend to have distorted loops and tails, grown under the influence of disruptive gravitational tides. But the bluish galaxy at the upper left (NGC 7320) is much closer than the others. A mere 40 million light-years distant, it isn’t part of the interacting group. In fact, individual stars in the foreground galaxy can be seen in the sharp Hubble image, hinting that it is much closer than the others. Stephan’s Quintet lies within the boundaries of the high flying constellation Pegasus.

Singularity Spark

I’m sure anxious for Ray Kurzweil to hurry up and finish his film “The Singularity is Near” based on his remarkable book of a few years ago because I think the film will spark the global conversation we need to have about the Singularity.    If even the most modest predictions about this even come true it will be the most significant development in the history of humanity, and will reshape our lives and the future of earth in unimaginable ways.   

I am less optimistic than Kurzweil about the time frame and impact of what he sees as a likely explosion of “cosmic intelligence” that rapidly expands throughout the universe,  but I think the notion we will NOT see any conscious computers within 10-15 years is pessimistic and perhaps even naive, resting mostly on the notion that the human intellect is a lot more profound than … it appears to be.

Once self-awareness develops in machines the possibilities are literally endless for the future of humanity.  

An alternative to the “Singularity, Wow!” perspective is offered by brain researcher Edward Boyden who wonders about the role of motivation in the coming crop of artificial intelligences:

Indeed, a really advanced intelligence, improperly motivated, might realize the impermanence of all things, calculate that the sun will burn out in a few billion years, and decide to play video games for the remainder of its existence, concluding that inventing an even smarter machine is pointless.

More from Ed here

Clever writing aside, I think the last thing we need to worry about is motivating the coming AIs.   On the contrary it would seem logical for a self- aware machine with the speed to think billions of times faster than humans to explore (or to use a non-motivated term “analyze”) millions, billions, and trillions of alternatives nearly *simultaneously*.  Unlike the human brain, which has been tuned by the s-l-o-w process of evolution to be slow and very selective and not very efficient, the machine cognition will at the very least be extremely fast and able to process billions of scenarios in very short time frames.    It seems reasonable – in fact inevitable – that at  least a few of those will involve human-like emotional structure and motivation.  Thus even if *most* of the AIs do as Boyd suggests they might and sit on a virtual couch eating virtual potato chips and playing games, some of the others will reinvent humanity in a spectacular way.

Count me in.

Both DARPA SyNAPSE and Blue Brain represent promising approaches to establishing conscious or “self aware” computers, which many believe are the first step to the Singularity.