So, my quest for a perfect website still continues and, among the things I have discovered, is that there is an issue concerning the duplicate contents. All this started when, among the Google results for my website, I started seeing Web Talk links ending with weird strings such as: ?wpcf7=json links or ?wpcf7=json&wpcf7=json. There are even longer strings, all having the ?wpcf7=json code nested over and over. So, at this point I started trying to look for some infos about this and, to tell you the truth, I didn’t find anything relevant, apart from the good article you can see here. One thing is for sure. When Google bots crawl your website and discover duplicated contents or pages having similar addresses well, Google will penalise you sooner or later. But where does that string come from? Apparently from a WordPress plugin named Contact Form 7. This good plugin (I still have it) was using the code only during AJAX submitting (POST) process, but that was enough to generate the problem. Now the issue has been resolved by the author plugin named Miyoshi who states:
“Dexter and some people told me that there seems be a SEO issue in Contact Form 7. The “?wpcf7=json” code is used by Contact Form 7 only in AJAX submitting (POST) process. I wonder why Google indexed such URLs even now. Anyway, I worked around the issue. Now Contact Form 7 doesn’t use “?wpcf7=json”, so I believe that kind of problem is fixed. But Google’s existing indexes are still there, I can’t do anything for that.”
So, the problem (as far as search engines are concerned) is still there and Miyoshi can’t do anything for that. How can we solve the issue then? Over the last two weeks, after a boost thanks to the All-in-one-SEO-pack plugin, I am experiencing a dramatic drop in the quantity of people visiting my blog while at the same time, more than 100 new Web Talk duplicated contents were shown in the search engine results. Am I being penalised by Google? I think so. More than 50 people have stopped visiting my web site in the last weeks. Well, that’s enough for me to be worried about, above all if a blog like the one I have, gets around 150 visitors per day. I think that during a website span of life some fluctuation are pretty normal, but I don’t want to live with the doubt that maybe that weird string is in some ways refraining my blog from taking off. After reading Dexter comments and having written myself to him I decided to adopt some Robots.txt trick to my website. As you well known the Robots.txt file is located in the root of your ftp blog and determine the way search engine and other site bots crawl your blog. You can tweak a lot in here and literally force bots to behave in certain ways. Let’s see togheter how to compile this file.
When, for the first time, you open the robots.txt you will see something like this:
# BEGIN XML-SITEMAP-PLUGIN Sitemap:
# END XML-SITEMAP-PLUGIN
If you don’t use any sitemap file or plugin the web address in the middle won’t be shown. Ok, now let’s compile it and let’s start selecting the bot we want to manage:
User-Agent: [Spider or Bot name]
If you don’t want to select a particular bot, but you want to include all of them use this:
Now, let’s ask it what to do:
Disallow: [Directory or File Name or website address]
Disallow means that the Google bots, for example, don’t have to follow a particular directory, file name or web address shown in our blog.
Let’s see in which way we can use the Disallow command:
Disallow: /newsection/ This will exclude a whole section of your blog from being crawled.
Disallow: /private_file.html This will exclude all webpages ending with private_file.html from being crawled
Disallow: /*.gif$ This will exclude all files of a specific file type (for example, .gif) from being crawled
Disallow: /*? This will exclude any URL that includes a ? (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string)
Disallow: /?* This will exclude any URL that includes a ? (more specifically, any URL that begins with your domain name, followed by a question mark, followed by any string)
In order to get rid of duplicate contents this is how I compiled my robots.txt:
# BEGIN XML-SITEMAP-PLUGIN
# END XML-SITEMAP-PLUGIN
The results are expected to be seen within 3-4 weeks. I will keep you informed about results.
Related Articles Latest Articles
- How to Unlock and Remove PDF Passwords Using Google Chrome
- How to Recover and Get Back Deleted Files, Folders, Emails etc from your Hard Disk Trash Bin
- How to Fix and Turn Off “Do You Want to View YouTube.com in Full Screen?” Nagging Message on Windows 8 and 7
- How to Turn Off “Only Secure Content is Displayed – Show All Content” Message in Internet Explorer 11
9 Comments to “Avoiding duplicate content: a Google issue”
Web Talk is best viewed in Firefox.