Duplicate Content: The Penalties of Ignorance

It is clearly understood that search engines impose a penalty upon websites when the content source is believed to have been copied from other sources on the web.
Although it could be argued that in their effort to deliver the most relevant, fresh content they are merely filtering what they index, the penalty, however, remains something to avoid.

Sometimes a webpage is discounted as being a duplicate merely because of its hierarchal address: http://mypage.com vs. http://www.mypage.com.  This can be corrected.

What can be done to avoid these duplicate content problems?
It’s a pretty easy fix.Repurpose Content

If your homepage(s) are experiencing this issue, first locate the .htaccess file. Once you have found it, open it up and merely add the following code to redirect all your non-URLs to the www-URLs:

RedirectMatch:301 ^(.*)$ http://mypage.com
RedirectMatch permanent:^(.*)$ http://www.mypage.com

Should you wish to eliminate your homepage or index page problems, simply employ a 301 redirect. This too, is delineated in the same .htaccess file by just using the below referenced code:

Redirect 301:/nastyurl.htm http://www.mypage.com
Again, modify the URLs to correlate with your situation.
Redirect 301:/index http://www.mypage.com

This will leave you a “clean” URL structure by permanently redirecting your /index to http://www.mypage.com.

What if someone has actually codejacked or stolen your content?
Are you already indexed?  If so, the duplicate content will just be filtered out.  Phew!

What about article spinning?
Re-writing numerous versions of the same article may be popular but a search engine can still discern the correlation and this too may penalize your site; even diminishing any included links.  Don’t do it.

And Blogs?
Blogs, however, are viewed a bit differently by search engines.  By mixing up the content or posting in different categories you should be able to avoid any duplicate content issues.

How can I discover duplicate content?
Try checking your website at http://www.copyscape.com

It’s an easy going website that does a simple search and compare with other websites around the internet to find duplicated content.

So how do I fix duplicate content problems?
First, try the “fixes” stated above and recognize that is not likely that someone will steal your content.  Checking SEOmoz.com in copyscape.com, will give you results but there is unlikely to be any penalty because they themselves were the originators.

If you discover your content has been stolen or copied, try contacting the webmaster via email and ask them to remove it. Chances are slim that they will respond and you’ll have to just chalk it up to a learning experience.  Don’t waste your time.

Using a Rel-Canonical Tag to Fix Other Duplicate Pages
As an example, if you manage a widget site that has multiple ways of locating a particular product, those duplicate URLs could penalize you. For example:


Different URLs but they direct to the same page. Try using a rel=canonical tag. This tells the Search Engines that both pages, although with duplicate content, should be treated as one in the same.