Google: A Trillion URLs and counting


The Google blog notes how huge the web is now, with Google indexing over a trillion unique URLs.  As they note in the article the actual number of indexable URLs is, in one sense, infinite.    For example calendar pages will automatically appear as you scroll through many applications, continuing through the years until..the singularity and beyond.     Of course Google does not index many of these “empty” URLs or even a lot of junk or redundant content, so the true number of real, unique URLs is actually well above a Trillion.

I think a fun question is this:   What will the information landscape look like in, say, 20 years when we should have the ability to pour *everything* from the past and the present online?     Questions might take a different form if we had access to every reference on a topic that has ever been produced.    Algorithms will be used to sort through the oceans of content much as Google does now, but with far more precision and better comprehension of the whole mess.

Adobe Air – offline to online is good


Adobe is launching an application that will allow people to work offline on forms and other content which will then automatically be posted to websites when they go back online.   This is an excellent “transitional” application because many users still have to “log on” to the internet via slow modems or other cumbersome connections, and this will help them participate more actively in the online ecosystem.

That said, I’m increasingly convinced that the explosion of user content is to some extent…over.   Certainly we’ll continue to see huge volumes of content pour online, but at least in terms of the USA it is fair to say that internet access and publishing are is now so easy and cheap it seems unlikely there are millions waiting in the wings to jump online.    Some studies are suggesting that “most” internet users have little interest in blogging or commenting or participating actively – rather they want to read and socialize but not produce much content.     Another interesting factor is that young women appear to be the top content producers in many social networking environments rather than geeky boys who are more likely to spend online time playing games.   It’s going to be very interesting to watch the new media trends shake out in the coming years.  

WSJ reports

Gutenberg + 550 years = Our ADDd Internet


John Naughton, writing in the Guardian, has a nice piece about the reading revolution inspired by Gutenberg and the uncertain future of our online equivalents to the books we have held dear for several centuries. 

Studies are noting how fleeting our attention has become, especially in our young folks.   In terms of “total enlightenment” I actually favor the quick skim to the in-depth read because I believe retention is better for the short bits of information as well as better for the “key concepts” that you get quickly from surfing on a topic.  

Thus if I read a carefully crafted work I’ll be moderately informed but then lose most of the information over the years, where if I jump around to 20 sources I’ll be similarly well-informed but will retain it better.

All that said, I’d agree with internet critics who suggest we may be losing our ability – to the extent it was ever there – to quietly and deeply reflect on topics.    Also I’d agree we don’t know the consequences of this shift, though from the national dialog about politics, religion, and other things I’d say we aren’t really falling back or making much progress.   We are a modestly contemplative primate, and we can’t escape that fate regardless of how we input the information.

Google’s knol project


Google’s about to launch yet another clever idea.  Called knol, it will feature authoritative articles about any topic which will use community rating and input.   

It will be interesting to see how this project compares to the excellent community produced content at Wikipedia, and also how Google handles the legitimate as well as scammy SEO tactics that always follow good content.     Disallowing links to commercial sites would seem to inhibit an author’s ability to feature things, but allowing them opens up the chance of abuses of the type that made Wikipedia choose to use NOFOLLOW tag on all external Wikipedia links.

The good news – more quality information online – yippee! 

Lessing’s “curmudgeonly missteps” should be forgiven. Close the book and open the internet.


Jeff Gomez over at the Print is Dead blog has the best piece I’ve read so far about Nobel prize winner Doris Lessing’s mild attack on the internet.    Lessing’s comments were buried in an otherwise inspiring story about the power of reading, knowledge, and education – a story about how some women in Africa were more concerned about having books to read than food to eat.

Lessing’s suggestion that youth is in the process of abandoning quality book reading in favor of the ‘inaninties” of the internet brought the very predictable blogOspheric response of derision heaped on an old litererary lady who deserves a lot more respect than she’s been getting.

Like Jeff, I can quickly forgive the increasingly irrelevant attacks on the internet.    In fact I agree with Lessing that we’ve lost something as people flock to the internet while abandonining books and newspapers, carrying with them little more than a keyboard and a short attention span.    But we gain something as well.   Something very profound.   The internet is not only far more engaging than books and newspapers, and the internet is not only far more accessible than books and newspapers.   The internet is interactive.  

VERY interactive.   

For the first time in all of human history, people from almost anywhere can communicate night and day, every day, with other people from almost anywhere else.    This tidal wave of human socializing has only just begun and the implications are staggering.   Complaining that books aren’t getting their due respect, while true, is a bit like rearranging the deck chairs on the Titanic.   The ship of knowledge known as the internet sailed long ago and is now a huge fleet carrying billions of people.     

As a Nobel prize winner for literature Doris Lessing will be remembered forever.    And rightly so.   But those memories, and photos, and videos, and copies of what she said will live forever in *digital form*.  They’ll live on the internet, long after all the paper representations have been relegated to a handful of dusty old museum archives and rich book collector’s shelves.

And that, dear Doris, is a very good thing.

Who is clicking at your online business door?


Back in July I missed this great post by Dave Morgan at AOL but thanks to Danah Boyd’s post it has surfaced again.    The findings are very surprising and very relevant to anybody running click or online advertising campaigns.   Dave summarizes the findings very concisely as follows:

We learned that most people do not click on ads, and those that do are by no means representative of Web users at large.

Ninety-nine percent of Web users do not click on ads on a monthly basis. Of the 1% that do, most only click once a month. Less than two tenths of one percent click more often. That tiny percentage makes up the vast majority of banner ad clicks.

Who are these “heavy clickers”? They are predominantly female, indexing at a rate almost double the male population. They are older. They are predominantly Midwesterners, with some concentrations in Mid-Atlantic States and in New England. What kinds of content do they like to view when they are on the Web? Not surprisingly, they look at sweepstakes far more than any other kind of content. Yes, these are the same people that tend to open direct mail and love to talk to telemarketers.

What does all of this mean? It means that while clickers may be valuable audiences, they are by no means representative of the Web at large

Indeed, this means that many online marketing campaigns may need to dig a lot deeper to obtain a positive ROI, and for some campaigns positive ROI is not attainable.    If, for example, irrelevant clickers (not to be confused with click abuse) mean you’ll have to spend a few dollars to reach a single prospect, and your margin on your product is only a few dollars, you may be fighting a losing PPC battle for online hearts, minds, and pocketbooks.    On the other hand if your target audience is, say, midwestern stay at home soccer moms, you may want to up your PPC spend dramatically because your nickel or dime per click could be worth many times that in prospective sales.

Obviously Dave’s post is only the beginning of the big story which has yet to be written,  and I’m not clear how representative this sample was of all PPC activity (I think it was broadly representative though – they looked at billions of data items).  However this helps me understand why some of my PPC experiments have failed to yield much of a return.     A good travel experiment given these findings would be to look at midwestern travel patterns and try to advertise popular packages to Mexico  or other commonly travelled points south in the winter.   Since women are the main travel planners this match could work well to increase the normally very low conversion I have seen on travel related PPC spends.

Mossberg on Amazon’s Kindle book reader – just fair.


Bloggers roundly panned the Kindle a few weeks ago during it’s launch, and then Amazon sold out of them almost immediately.  However many (including me) suspect they just didn’t build that many.   Given the negative initial reactions from so many, and the fact Amazon has very conspicuously failed to mention how many sold, I think the “Kindle sell out” was a marketing ploy rather than a sign of the Kindle’s popularity.    In fact I’d be surprised if they sold more than 50,000 or so – probably far short of the numbers needed to bring the Kindle project close to anything approaching profitability.

Adding insult to injured initial reputation, Walt Mossberg just wrote in the Wall Street Journal that the Kindle is just an OK device.   He was not too hard on it, but no endorsement either.     In contrast and over at CNET, Josh Taylor is warming up to the Kindle after a few weeks of use.    Of course he was on a beautiful tropical beach reading, so maybe that colored his perception to a Kindley hue ?     All I know is that at $399 + $9.99 per book and a buck per blog I won’t be buying one anytime soon.    

Berners-Lee: More study of WWW needed


Tim Berners-Lee, the closest thing we have to an “inventor” of the web as we know it today, is calling for more integrated, broad studies of the internet rather than the mostly piecemeal academic work being done now.     He’s right.   The internet is arguablly the most profound change in human communication in history, and it’s just getting started.    As social networking explodes into the dominant socializing mechanism for humans we are experiencing many new opportunities and many challenges, especially as the online environments create new relationships between people, generations, and cultures.

Universities would be well advised to heed this call from Berners-Lee and offer more “web centric” courses, but more importantly academics should be spending a lot more time studying the complex, changing structure of the web.  The technical aspects of the internet are fairly well studied in commercial circles.   The sociological side is  poorly/rarely studied in academia and the commercial sector is still struggling to understand the implications of the massive shift of human activity online.   

Print Media Future – so dim, you won’t need to wear shades.


Two articles today suggest how tough it’s becoming to turn a buck in the print media world.   Jeff Jarvis at BuzzMachine and founder of eWeek, notes in “Whither Mags”, that major print efforts require a huge capital outlay before they can even hope to be profitable, and that the current high risk associated with print publications means we probably won’t see nearly as many new big magazine efforts.   

Even more ominous is the New York Times report today showing circulation declines almost across the board for US Newspapers.  The  NYT Article “More Readers Trading Newspapers for Websites” has a great graphic showing how circulation has fallen at most newspapers since last year with an average drop of 2.4%.    Given the relatively thin profit margins at many papers and the fact many costs are fixed this does not bode well at all for the future of newspapers.   The future of news?   That is a far more complex question and I think the answer is not knowable at this time.    Blogs are picking up some of the journalistic slack, but I’m not convinced they can pick up all of it. 

Information Sharecroppers of the World, Unite ! ?


Update:  I think Nick (and I) may owe Newsvine an apology, because Newsvine does not really practice sharecropping.   The members own their own content and this means a lot more control than otherwise.    Obviously the landscape is complex with any social media but I don’t think I can object to Newsvine’s model.    My concern is where the site takes ownership of the member content.

—-

Nick Carr  has a good post today noting how the Newsvine aquisition, and other deals like this, can lead to some information “sharecropper” dissent.     As I pointed out yesterday social media is a great thing, but it seems to be dramatically failing to fund the very forces that make it a great thing – the hardest working content providers that often form the backbone of these entities.     Kevin Rose is worth tens of millions because tens of millions of diggers work for him – for free.   Sure, he’s smarter than most of his minions and he pulled it all together which means he should get a big digg payday some day, but should he, the founders, and the VC funders get *all* of the money when even they’d all agree that digg is valuable primarily because of all the people that do the digging.

Newsvine was a superb project that was beautifully implemented, but like Nick I wonder how long those who helped make Newsvine such a great site will keep working for nothing.     Is  Web 2.0 simply a new twist on feudal economics?