31
Jul
2008

Double slashes (//) and duplicate content in URLs

Web Talk, WebmasterComments Off on Double slashes (//) and duplicate content in URLs





Bookmark and Share

If you are a webmaster, there are a lot of ways to optimize a website or a blog to be placed high in the SERP (search engine results page). One of this is, without any doubt, eliminate duplicate content from your web pages, articles and posts. As a matter of fact web engine spiders (such as Google Bot, Yahoo Bot, MSN Bot etc.) while crawling a website tend to look for suspicious content which may look too alike to something they already “saw” while surfing other sites. For a spider, the more a website has got unique content, the better. This, to make sure that what a web author produces is original and has not been forged or copied from somewhere else. That’s why, if your website has got a lot of similar content, Google and other web engines will place it low in the search result for a particular keyword or a whole set of them, with the side-effect that less people will read what you have written. But, what is a duplicate content exactly? Duplicate content is everything which looks similar to something else. Two articles speaking about the same topic will inevitably have something alike, but if the similarities are too many and whole sentences are exactly the same, because they are the result of a copy-and-paste action, well, this is duplicate content. But web spiders go beyond all this. As a matter of fact, if two posts have the same, or too similar addresses, for the these spiders this is another good example of duplicate content.

We don’t know when, why or because, but sometimes it seems like that when Google spiders a website, it might end up getting wrong URL link locations by arbitrarily adding an extra or a triple slash (//). For example, Google might crawl the article www.mywebsite.com/testpage.php correctly, but at the same time it could even crawl something like www.mywebsite.com//testpage.php, which in your website doesn’t exist at all, of course, but for Google is another real page belonging to it. At this point Google bots will mark them as duplicate content, and when next time a person will look for the keyword testpage, your article could be placed in the 344 position, lessening in this way the chance to be read. Luckily for us there is a little trick to avoid all this. All you have to do is write this little piece of code in your .htaccess file which usually is located in the plublic_html folder, in the root directory of your site.

  1. Open your FTP client and reach your website.
  2. Open public_html folder.
  3. Right click on the .htaccess file and click edit. If the .htaccess doesn’t exist, just create a htaccess.txt file with Windows notepad, put it in the public_html folder and rename it .htaccess
  4. Copy and past the following code in the .htaccess:
  5. # Remove multiple slashes anywhere in URL

    RewriteEngine On

    RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
    RewriteRule . %1/%2 [R=301,L]

  6. If you want something more efficient copy and past the following code in the .htaccess:
  7. # Remove multiple slashes after domain

    RewriteEngine On

    RewriteRule ^/(.*)$ http://www.yourwebsitename.com/$1 [R=301,L]

  8. These codes will just remove douple or triple slashes anywhere in your website addresses.



Related Articles Latest Articles
.

Comments are closed.


Copyright © 2007-2017 | Sitemap | Privacy | Back To Top
Best screen resolution 1280x800 or higher.
Web Talk is best viewed in Firefox.

Stat