When you hit “publish” on a new post, how many copies of that post exist on your site? How about on the Web?
The truth is that there is no way to know exactly how many copies of your post you create because every theme and every site are different in this area. However, depending on your setup, you can create more than a dozen copies of the work on your site and that can create a serious headache both for yourself and for the search engines.
Duplicate content may not be the extreme danger it once was but it is a lurking problem for bloggers and other webmasters alike. However, it isn’t a simple one to stop, especially considering that the issue isn’t limited to what is going on with your site but can be amplified by the actions of others sites, including those you don’t control.
It’s worth taking a moment, if you haven’t already, to understand duplicate content, how it works and, most importantly, how to avoid it.
Duplicate content is when the same or very similar content appears on multiple pages or multiple sites. As said by Ozh, the search engines don’t like this as there is no point in listing the same content multiple times in its results. There is no benefit to the users in doing that.
So what the search engines attempt to do is find the “best” version of the content, link to it and penalize the rest. This will include either lowering the other copies in the ranking or, in other cases, banishing them using the duplicate content filter.
If you’ve ever seen the notice at the footer of a Google results page that “In order to show you the most relevant results, we have omitted some entries very similar to the XXX already displayed.
If you like, you can repeat the search with the omitted results included.” you have witnessed this filter in action.
The problem, however, is that duplicate content is very easy to create on accident. Depending on your theme, you can actually publish the same content to a dozen URLs or more on your site. For example, your tag pages, category pages, post page and archive page can all display the same content, forcing Google to choose the best one.
Likewise, other sites that pick up your content, whether legitimately or unlawfully, can create duplicate content issues as can you posting the same item to multiple sites.
The problem is that, while Google will try its best to pick the best version, it isn’t perfect and often gets it wrong. This is an especially big problem when dealing with new sites that aren’t well-established.
The penalty can range from simply having the ranking of the “real” version reduced erroneously to, in extreme cases, having your site treated like a spam blog that is simply shooting out the same content over and over again.
This makes it important to be aware of this problem and more important to try and prevent it from becoming a burden to your site.
How to Avoid It
Duplicate content is, unfortunately, not a straightforward problem to solve as there are three potential sources for trouble, your site, other sites you control and other sites outside of your control. Each require different strategies and a different approach.
That being said, here are some steps that you should take to try and avoid these issues when possible.
- Avoid www/non-www Conflicts: Right now there are two ways to visit most sites on the Web, www.domain.com and simply domain.com. Both of these versions of your URL can be indexed in Google, creating an exact copy of your site at two URLs. To prevent this you need to either redirect one to the other or tell Google which you prefer via Google Webmaster Tools.
- Truncate Non-Post Versions: Check and make sure your tag, category, archive and other non-individual post pages are only displaying an excerpt of the post. Usually, short previews of the content do not create duplicate content issues as Google favors the full version.
- Be Careful About Linking: Make sure that when you use short URLs they have a proper 301 redirect and that you don’t link to the same content using multiple URLs. Remember, duplicate content can only be an issue if Google sees the different URLs.
- Don’t Cross-Post Content: If you post the same article to multiple sites, especially in full, you risk duplicate content issues as Google will inevitably treat one as the original and axe the other, meaning one site’s growth will come at the expense of another. If you must cross-post, try to bar the search engines from indexing the one(s) that you don’t prefer.
- Deal with Harmful Infringement: If a site republishes your content but links to the original, odds are it won’t hurt you as Google takes that as a clear indication of which version is the trusted one. However, in some cases and especially without a link, an unauthorized copy can hurt you. If it does, deal with it and get the content removed from the Web or, if needed, from the search engines.
In short, what it all comes down to is ensuring that there is only one URL on your site that displays the full content and that your content elsewhere is either properly attributed with Google-visible links or is not indexed, possibly using meta tags or robots.txt files.
It’s a simple theory, but the problem is that, with so many issues to control, it can be intimidating to even get a good start. Fortunately, common sense is all that’s really necessary and a willingness to check your site and make sure some problems didn’t slip in under your radar.
The principles of duplicate content are easy to understand but the actual enforcement is very difficult. Do a search for any of your most popular blog posts and you’ll likely see the problem first hand. Between the way most blogging applications behave, other sites using your content and social media, it’s easy to put a single piece of content out on the Web dozens of times without trying.
Fortunately, it is a problem Google has been getting better at addressing but they are still far from perfect with it.
So, since Google doesn’t have all of the answers, it’s up to us to provide some of our own and try to keep this problem to a minimum.