So, my quest for a perfect website still continues and, among the things I have discovered, is that there is an issue concerning the duplicate contents. All this started when, among the Google results for my website, I started seeing Web Talk links ending with weird strings such as: ?wpcf7=json links or ?wpcf7=json&wpcf7=json. There are even longer strings, all having the ?wpcf7=json code nested over and over. So, at this point I started trying to look for some infos about this and, to tell you the truth, I didn’t find anything relevant, apart from the good article you can see here. One thing is for sure. When Google bots crawl your website and discover duplicated contents or pages having similar addresses well, Google will penalise you sooner or later. But where does that string come from? Apparently from a Wordpress plugin named Contact Form 7. This good plugin (I still have it) was using the code only during AJAX submitting (POST) process, but that was enough to generate the problem. Now the issue has been resolved by the author plugin named Miyoshi who states:
“Dexter and some people told me that there seems be a SEO issue in Contact Form 7. The “?wpcf7=json” code is used by Contact Form 7 only in AJAX submitting (POST) process. I wonder why Google indexed such URLs even now. Anyway, I worked around the issue. Now Contact Form 7 doesn’t use “?wpcf7=json”, so I believe that kind of problem is fixed. But Google’s existing indexes are still there, I can’t do anything for that.”
So, the problem (as far as search engines are concerned) is still there and Miyoshi can’t do anything for that. How can we solve the issue then? Over the last two weeks, after a boost thanks to the All-in-one-SEO-pack plugin, I am experiencing a dramatic drop in the quantity of people visiting my blog while at the same time, more than 100 new Web Talk duplicated contents were shown in the search engine results. Am I being penalised by Google? I think so. More than 50 people have stopped visiting my web site in the last weeks. Well, that’s enough for me to be worried about, above all if a blog like the one I have, gets around 150 visitors per day. I think that during a website span of life some fluctuation are pretty normal, but I don’t want to live with the doubt that maybe that weird string is in some ways refraining my blog from taking off. After reading Dexter comments and having written myself to him I decided to adopt some Robots.txt trick to my website. As you well known the Robots.txt file is located in the root of your ftp blog and determine the way search engine and other site bots crawl your blog. You can tweak a lot in here and literally force bots to behave in certain ways. Let’s see togheter how to compile this file.
When, for the first time, you open the robots.txt you will see something like this:
# BEGIN XML-SITEMAP-PLUGIN Sitemap:
http://www.mywebsitename/sitemap.xml.gz
# END XML-SITEMAP-PLUGIN
If you don’t use any sitemap file or plugin the web address in the middle won’t be shown. Ok, now let’s compile it and let’s start selecting the bot we want to manage:
User-Agent: [Spider or Bot name]
If you don’t want to select a particular bot, but you want to include all of them use this:
User-Agent: *
Now, let’s ask it what to do:
Disallow: [Directory or File Name or website address]
Disallow means that the Google bots, for example, don’t have to follow a particular directory, file name or web address shown in our blog.
Let’s see in which way we can use the Disallow command:
Disallow: /newsection/ This will exclude a whole section of your blog from being crawled.
Disallow: /private_file.html This will exclude all webpages ending with private_file.html from being crawled
Disallow: /*.gif$ This will exclude all files of a specific file type (for example, .gif) from being crawled
Disallow: /*? This will exclude any URL that includes a ? (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string)
Disallow: /?* This will exclude any URL that includes a ? (more specifically, any URL that begins with your domain name, followed by a question mark, followed by any string)
In order to get rid of duplicate contents this is how I compiled my robots.txt:
# BEGIN XML-SITEMAP-PLUGIN
Sitemap: http://www.webtlk.com/sitemap.xml.gz
User-agent: *
Disallow: /?wpcf7=json*
Disallow: /*?wpcf7=json
# END XML-SITEMAP-PLUGIN
The results are expected to be seen within 3-4 weeks. I will keep you informed about results.
Tags: SEO, trick, WebTalk
Related Articles
Latest Articles
- I am going on Vacation at last!
- New Transcend Digital Frame T.Photo 720
- How to change the preview picture in Folder Thumbnail in Vista
- Nokia N810 running Google Android? Yes, it can!
- How to export bookmarks from Firefox to Internet Explorer
- Shuttle Barebone XPC SN78SH7
- How to use Pwnage tool to jailbreak (unlock) iPhone and iPod
- Enable Spellchecking in Firefox 3
- Classic Menu Bar can't be closed in WIndows Vista File Explorer
- Restore the position of desktop icons after changing screen resolution
- How to make Firefox faster
- How to write URLs in the address bar quickly
4 Responses to “Avoiding duplicate content: a Google issue”
Leave a Comment
All contents are licenced under a Creative Commons Licence.


(3 votes, average: 4.67 out of 5)
February 16th, 2008 at 7:10 am
Thanks for the link my friend. I do hope that the trick will also work for you. Like what happen at my site. It will not be remove that easy but google will put some low ranking to those with json upon implementing the robots.txt
February 16th, 2008 at 7:16 am
You are welcome Dexter. You gave me some good tips about something i had no idea!
June 24th, 2008 at 8:27 am
[...] called: ?wpcf7=json which is extractly the same as cnreviews homepage. According to WebTalk, this is a problem created by a Wordpress plugin called “Contact Form 7″ which we have [...]
July 7th, 2008 at 2:21 pm
[...] called: cnreviews.com?wpcf7=json which is extractly the same as cnreviews homepage. According to WebTalk, this is a problem created by a Wordpress plugin called “Contact Form 7″ which we have [...]