The Truth About Duplicate Content

by WarriorForum.com Administrator
20 replies
A new article on Search Engine Journal says you shouldn't let the shiny new content you worked so hard to produce become invisible. Here's the truth about duplicate content and how to solve it.



Duplicate content is just what it sounds like. It's when the same copy appears on two or more web pages. It can occur on your own site or copy on another site you don't control. Duplicate content is not items like footers and other content that make sense to appear on multiple pages.

Google knows this content is not the "meat" of what you are trying to say, usually based on pagination - or how your page is designed.

You Need To Check For Duplicate Content

I've found that even experienced SEO pros rarely check for duplicate content except in the beginning during Technical Discovery. This is a mistake. Duplicate content can happen when someone else scrapes your site and posts your content as their own.

It also occurs on websites because creating original content is hard, and it can be easier just to cut and paste content for similar pages. I recommend setting up a schedule to monitor for duplicate content. Some tools automatically monitor duplicate content regularly and send an alert when it is found.

Duplicate Content Monitoring

There are many different tools available to monitor for duplicate content. We use three different tools. Our first choice is Semrush.

In Semrush, the site audit report checks for duplicate content - but only on the domain. So we use a second tool to monitor duplicate content and other parts of the Web. We have found that Copyscape works best, but there are many other tools out there.

We also use Grammarly, which has a great Chrome plug-in for quick checks on sites you visit. Most of the tools are meant for teachers or others who need to check for plagiarism. These tools may not be explicitly designed for finding "duplicate content," but work great to find it.

How Much Duplication Is Ok?

As far as I know, the major search engines have not defined what exactly constitutes duplicate content. Many SEO experts have attempted to define when content goes from similar to duplicate. I prefer all content to be at least 30% different from all other copy.

I use an old "keyword density" application for this. Several tools compare two pieces of content and provide the percentage of duplication. Go to Google and query "duplicate content checker" or "keyword density tool," and you should be able to find one that works for you.
#content #duplicate #truth
  • Profile picture of the author Saurav Gami
    Great information, Thanks for sharing
    {{ DiscussionBoard.errors[11709197].message }}
  • Profile picture of the author dave_hermansen
    So, in other words, the "truth" about duplicate content is "the major search engines have not defined what exactly constitutes duplicate content" and that "many SEO experts have attempted to define when content goes from similar to duplicate" (but there is no consensus).

    The author then tosses out their best guess of 30%.

    Some "truth"!
    Signature
    BizSellers.com - The #1 place to buy & sell websites!
    We help sellers get the MAXIMUM amount for their websites and all buyers know that these sites are 100% vetted.
    {{ DiscussionBoard.errors[11709235].message }}
    • Profile picture of the author GordonJ
      Dave,

      Did you not get the memo? In today's world, best guesses and made up FACTs serve as de facto truths?

      It's true. Trust me.

      GordonJ

      P.S. Hey, truth like a leaner in horsehoes, is counted, or close enough, is, well, close enough. HA! yet maddening.


      Originally Posted by dave_hermansen View Post

      So, in other words, the "truth" about duplicate content is "the major search engines have not defined what exactly constitutes duplicate content" and that "many SEO experts have attempted to define when content goes from similar to duplicate" (but there is no consensus).

      The author then tosses out their best guess of 30%.

      Some "truth"!
      {{ DiscussionBoard.errors[11709249].message }}
    • Profile picture of the author DABK
      Dave, Dave, Dave. You miss the True Truth: Searchengine Journal has a content production strategy: so many posts per week and each must be long, really long. (Because Google loves really long stuff, didn't you get the memo?) T



      WarriorForum? Ditto.


      Did you notice how long they took to say nothing?


      Because, Dave, you must have so many hundreds of words a day, day in and day out, otherwise Google is going to get mad at you and not send you visitors, 'member?


      Sigh. At least they're only killing pixels, not trees.


      Originally Posted by dave_hermansen View Post

      So, in other words, the "truth" about duplicate content is "the major search engines have not defined what exactly constitutes duplicate content" and that "many SEO experts have attempted to define when content goes from similar to duplicate" (but there is no consensus).

      The author then tosses out their best guess of 30%.

      Some "truth"!
      Originally Posted by theblur View Post

      Do you feel that duplicate content only hurts that particular page or can it hurt the site's overall health as well?
      {{ DiscussionBoard.errors[11709394].message }}
  • Profile picture of the author spartan14
    I am agree duplicate it dont help your site but sometimes also the big experts in seo duplicate a liitle i think .Sometimes the content you create you need to inspire from other sources
    {{ DiscussionBoard.errors[11709285].message }}
  • Profile picture of the author zqazfg
    That great
    {{ DiscussionBoard.errors[11709310].message }}
  • Profile picture of the author theblur
    Do you feel that duplicate content only hurts that particular page or can it hurt the site's overall health as well?
    {{ DiscussionBoard.errors[11709382].message }}
  • Profile picture of the author jameer
    Useful information, Thanks for sharing.
    {{ DiscussionBoard.errors[11710314].message }}
    • Profile picture of the author DABK
      Where? Please point it out to me.

      Originally Posted by jameer View Post

      Useful information, Thanks for sharing.
      {{ DiscussionBoard.errors[11710349].message }}
  • Profile picture of the author dataplusvalue
    By the way, Google's policy for duplicate content will be known to all. But you can post blog content again by re-writing another type and making it unique, which will not be a duplicate in the eyes of Google.
    {{ DiscussionBoard.errors[11710391].message }}
  • Evry time you add an O to dooplicate, you pursue the open gateway to the Googsy Nouveau.
    Signature

    Lightin' fuses is for blowin' stuff togethah.

    {{ DiscussionBoard.errors[11710433].message }}
  • Profile picture of the author savidge4
    This thread is a bit hard to read.

    Duplicate content has been a thing for a while now. Google in particular has in the past had "Systems" in place to better identify "Original" vs "Duplicate" Dont remember the exact date but 2010 to 2012 ish Google Author Tags were a thing... as we all know that was retired. BUT, not retired and forgotten, but actually replaced. Schema Author tags as well as post publish time stamping took its place.

    I can only imagine even today and in response to this post there will be some amount of Oh no... they dont... Nah... - Your kidding yourself.

    So what happens if you create a piece of content and there is no Schema Author stamp or Post date / time stamp AND it is then scraped and posted? Google in particular is then left to determine basically chain of custody no?

    IN THEORY... 2 pieces of content that are exactly the same - only ONE should appear in the search results - And without proper steps taken ( schema tags ) Google is generally left with which was indexed first - and here in lays the problem.

    I am sure we have all seen it where a post is listed within search a number of times and often is listed as dplicates left out click to see ( something to that effect ) and that clearly is an indication that Google is having a hard time determining "Authorship"

    Written content on the internet is IP its copyright... and in some cases needs to be "Protected" Starting with the foundation of Schema tags - primarily Author, and post time / date stamping, makes this much easier
    Signature
    Success is an ACT not an idea
    {{ DiscussionBoard.errors[11710454].message }}
  • Originally Posted by WarriorForum.com View Post

    You Need To Check For Duplicate Content

    I've found that even experienced SEO pros rarely check for duplicate content except in the beginning during Technical Discovery. This is a mistake. Duplicate content can happen when someone else scrapes your site and posts your content as their own.

    It also occurs on websites because creating original content is hard, and it can be easier just to cut and paste content for similar pages. I recommend setting up a schedule to monitor for duplicate content. Some tools automatically monitor duplicate content regularly and send an alert when it is found.
    Actually, I think it would be a good idea to do this at the same time when you're updating content. It would make things more efficient.
    {{ DiscussionBoard.errors[11710990].message }}
  • Profile picture of the author hardworker2013
    The more authority your website have is the less you need to be worried about duplicate content. i know a lot of high authority sites that scrape content from other sites and they still get google love.
    {{ DiscussionBoard.errors[11711066].message }}
    • Profile picture of the author CyberSEO
      Originally Posted by hardworker2013 View Post

      The more authority your website have is the less you need to be worried about duplicate content. i know a lot of high authority sites that scrape content from other sites and they still get google love.
      Indeed. E.g.: Google News, Bing News, etc. They all are content aggregators. That mean they don't provide content. They scrape it from other sites and show it as a news digest.
      Signature
      CyberSEO Pro - the ultimate AI autoblogging and RSS, XML, HTML, JSON and CSV import plugin for WordPress with support for OpenAI o1, Claude, Gemini, Llama 3, Midjourney, DALL-E, Stable Diffusion and more.
      {{ DiscussionBoard.errors[11711177].message }}
  • Profile picture of the author Thomas Williams
    thank forproviding such valueable information
    {{ DiscussionBoard.errors[11711297].message }}
  • Profile picture of the author Randall Magwood
    If you wanted to do tracking on 2 identical squeeze pages that are advertised on 2 different sites... and you have two different autoresponder html codes on the sites... are you still penalized for duplicate content? Even though you're doing tracking?
    {{ DiscussionBoard.errors[11711508].message }}
  • Profile picture of the author farah azeem
    Banned
    [DELETED]
    {{ DiscussionBoard.errors[11712302].message }}
    • Profile picture of the author DABK
      We've all known that for ages. How does it fit into the "Truth About Duplicate Content?"


      Originally Posted by farah azeem View Post

      Duplicate content consist of two or more instances of the same content in the multiple places on the internet. It can exist because a site owner has created it on purpose, it can be the result of plagiarism or it can emerge as a side effect of website mismanagement.
      {{ DiscussionBoard.errors[11712373].message }}
  • Profile picture of the author malzeri83
    Could be in 2 words - duplicate is not good. Ok. Like people who can make 2 words with full page text.
    {{ DiscussionBoard.errors[11713471].message }}
  • Profile picture of the author Old Molases
    Very well explained. Thanks for sharing.
    {{ DiscussionBoard.errors[11713777].message }}

Trending Topics