Improving Google


Ha – it’s presumptuous to suggest improvements to huge companies like Google, but that is what the internet, and blogging in particular, is all about.    Master UK SEO  Dave Naylor has got five suggestions over at his blog and several others have chimed in.     I wasn’t sure why  Dave suggested clustering all the WordPress sites, forcing people to get a new domain, but this small inconvenience might be a good form of spam filtering because it prevents spammers from using free WordPress sites.       There’s now a conflict between the desire of search engines to screen out “junk” content and spammers and the desire to rapidly include new content.    It is not as easy as many like to think to even define junk content.    Last year I had a good talk with Brian White of Google’s search quality team about how to “value” content.  I posed a question along these lines:

What if you have two sites that are extremely similar in content and quality.
Both are about pet cats.
Both are of horrible quality with terrible grammer, bad facts, and spelling errors.

Site 1 is from  a spammer to boost rankings for a site selling pet food.
Site 2 is from a 3rd grade student working hard on her school report.

In this case site 1 is spam and site 2 is not, but how does Google tell the difference since they are virtually identical?

His answer was to suggest that the links structure in to these sites is likely to be different, and that through this you could probably determine which was the “real” and which was the “spam” site.

Of course this gets even more interesting when you make site 1 – the “spammy” site – of much higher quality.    In that case you might have a case where 99% of all users would prefer going to the site that is trying to manipulate Google but Google has removed that site and left the lower quality, natural one.

This is a very interesting case because I think search has recently devolved into many such ranking challenges.   Much of the content pouring online now is specifically designed to fool the search engines.

This would be an example of what  I’ve noted before – how linking relationships built the web and now the value of linking seems to be hurting it.

Here were my 5 suggestions to Dave / Google:

* Paid site reviews to identify simple problems or penalties. The subtle confusion Google spawns from ambiguous rules applied to mom and pop sites who have no clue is hurting everybody, including Google.

* Implement “site ID” where all sites showing adsense must have a contact person who is identified publicly. Forward site complaints to this person.

* Have more Google parties but drop the cold hamburgers from Google Dance 2007

* Transparency on publisher revenue share from Adsense

* MORE transparency on guidelines and penalties. Less vague references to “sites built for users not adsense”.

5 thoughts on “Improving Google

  1. Well, ofcourse the obvious comment is perhaps neither of the two sites is worthy of mention and a properly constructed Google algorithm would reject the childish drivel of a third-grader as well as rejecting the slightly “better” spam-crafted site (but ofcourse would not reject comments about their grammar/grammer). Perhaps we are at a stage where “mom and pop” sites simply cannnot obtain a sufficiently high ranking. The rejection of ‘mom and pop’ quality is similar to a manufacturer terminating long-standing relationships with mom and pop franchisees unable to keep up with technological change.

    Just as the site has a publicly identified contact person so too would spam mailers stop dumping all their complaint and bounceback traffic to dev/null? And anonymous authorship is somehow to be banned?

    Transparency on adsense revenue figures might indeed help but I would doubt it will ever take place.

    The Mobilegas Economy Run became useless when all competitors had the goal of gaining good results in the test calculations rather than actually having good mileage. Its just going to take much longer for good Google search rankings to destroy good Google search results.

  2. Perhaps we are at a stage where “mom and pop” sites simply cannnot obtain a sufficiently high ranking.

    Even Google would agree that there are problems in this area. Often a searcher would ideally find a mom and pop site (e.g. a local rafting company) rather than a more sophisticated but less local and less relevant service provider. Yet the small sites are subject to forces far outside of their control, and as the link economy has changed it makes it hard for “real” sites to get free links from anywhere.

  3. From Slashdot:
    Google Watchdog reporting “Spam and virus sites infesting the Google SERPs” …Google’s index hacked. The circumvention of a guideline normally picked up by the Googlebot quickly is worrisome. None of the sites have real content and don’t appear to even be hosted anywhere. How did millions of sites get indexed if they don’t exist.

  4. FG I also don’t understand why it’s so hard for Google, with the world’s biggest computer infrastructure, to simply weed out sites that a human can see are bogus. They always talk about scalability and I think they are willing to allow bad results while they figure out how to get them auto-rejected. The problem is not as trivial as it first appears because the spammer are very good at tricks. For example you can have a program generate fake “content” that is detectible by a human but is hard for a machine because it has trouble looking at the full context of a sentence or paragraph.

  5. Actually, I’ve noticed an increase in search results that seem to be a “word salad” of unrelated phrases with a search term inserted into them on the fly. Also any particular searcher will soon start to notice that his searches from the prior week are suddenly started to show up inside that strange “word salad” of disjointed phrases.

    I would agree with Google that seeking an automated solution is necessary for them and I certainly agree with your statement that its hard to get a machine to recognize content that has been artfully concocted by a spammer.

Leave a reply to JoeDuck Cancel reply