Download Windows XP SP3 RC2 v.3300 via Bittorrent or Depositfiles Panasonic 32GB class 6 SDHC card
39 Hits
16
Feb
2008

So, my quest for a perfect website still continues and, among the things I have discovered, is that there is an issue concerning the duplicate contents. All this started when, among the Google results for my website, I started seeing Web Talk links ending with weird strings such as: ?wpcf7=json links or ?wpcf7=json&wpcf7=json. There are even longer strings, all having the ?wpcf7=json code nested over and over. So, at this point I started trying to look for some infos about this and, to tell you the truth, I didn’t find anything relevant, apart from the good article you can see here. One thing is for sure. When Google bots crawl your website and discover duplicated contents or pages having similar addresses well, Google will penalise you sooner or later. But where does that string come from? Apparently from a Wordpress plugin named Contact Form 7. This good plugin (I still have it) was using the code only during AJAX submitting (POST) process, but that was enough to generate the problem. Now the issue has been resolved by the author plugin named Miyoshi who states:

Dexter and some people told me that there seems be a SEO issue in Contact Form 7. The “?wpcf7=json” code is used by Contact Form 7 only in AJAX submitting (POST) process. I wonder why Google indexed such URLs even now. Anyway, I worked around the issue. Now Contact Form 7 doesn’t use “?wpcf7=json”, so I believe that kind of problem is fixed. But Google’s existing indexes are still there, I can’t do anything for that.”

So, the problem (as far as search engines are concerned) is still there and Miyoshi can’t do anything for that. How can we solve the issue then? Over the last two weeks, after a boost thanks to the All-in-one-SEO-pack plugin, I am experiencing a dramatic drop in the quantity of people visiting my blog while at the same time, more than 100 new Web Talk duplicated contents were shown in the search engine results. Am I being penalised by Google? I think so. More than 50 people have stopped visiting my web site in the last weeks. Well, that’s enough for me to be worried about, above all if a blog like the one I have, gets around 150 visitors per day. I think that during a website span of life some fluctuation are pretty normal, but I don’t want to live with the doubt that maybe that weird string is in some ways refraining my blog from taking off. After reading Dexter comments and having written myself to him I decided to adopt some Robots.txt trick to my website. As you well known the Robots.txt file is located in the root of your ftp blog and determine the way search engine and other site bots crawl your blog. You can tweak a lot in here and literally force bots to behave in certain ways. Let’s see togheter how to compile this file.

When, for the first time, you open the robots.txt you will see something like this:

# BEGIN XML-SITEMAP-PLUGIN Sitemap:

http://www.mywebsitename/sitemap.xml.gz

# END XML-SITEMAP-PLUGIN

If you don’t use any sitemap file or plugin the web address in the middle won’t be shown. Ok, now let’s compile it and let’s start selecting the bot we want to manage:

User-Agent: [Spider or Bot name]

If you don’t want to select a particular bot, but you want to include all of them use this:

User-Agent: *

Now, let’s ask it what to do:

Disallow: [Directory or File Name or website address]

Disallow means that the Google bots, for example, don’t have to follow a particular directory, file name or web address shown in our blog.

Let’s see in which way we can use the Disallow command:

Disallow: /newsection/ This will exclude a whole section of your blog from being crawled.

Disallow: /private_file.html This will exclude all webpages ending with private_file.html from being crawled

Disallow: /*.gif$ This will exclude all files of a specific file type (for example, .gif) from being crawled

Disallow: /*? This will exclude any URL that includes a ? (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string)

Disallow: /?* This will exclude any URL that includes a ? (more specifically, any URL that begins with your domain name, followed by a question mark, followed by any string)

In order to get rid of duplicate contents this is how I compiled my robots.txt:

# BEGIN XML-SITEMAP-PLUGIN

Sitemap: http://www.webtlk.com/sitemap.xml.gz

User-agent: *

Disallow: /?wpcf7=json*

Disallow: /*?wpcf7=json

# END XML-SITEMAP-PLUGIN

The results are expected to be seen within 3-4 weeks. I will keep you informed about results.



Tags: SEO, trick, WebTalk
BoringOKNiceInterestingCool! (3 votes, average: 4.67 out of 5)



Related Articles

Latest Articles



4 Responses to “Avoiding duplicate content: a Google issue”

  1. Dexter | Techathand.net Says:

    Thanks for the link my friend. I do hope that the trick will also work for you. Like what happen at my site. It will not be remove that easy but google will put some low ranking to those with json upon implementing the robots.txt

  2. Web Talk Says:

    You are welcome Dexter. You gave me some good tips about something i had no idea!

  3. Duplicated Contents Problem, Be Careful! Says:

    [...] called: ?wpcf7=json which is extractly the same as cnreviews homepage. According to WebTalk, this is a problem created by a Wordpress plugin called “Contact Form 7″ which we have [...]

  4. Wild Card Zheng Jie Eliminated from Wimbledon | Travel - CN Reviews Says:

    [...] called: cnreviews.com?wpcf7=json which is extractly the same as cnreviews homepage. According to WebTalk, this is a problem created by a Wordpress plugin called “Contact Form 7″ which we have [...]

Leave a Comment

Did I speak about you, your website, your blog, your device or software? Write a comment to let Web Talk readers know more about it.


All contents are licenced under a Creative Commons Licence.
Creative Commons License