Here are 2 files i used:
File 336.txt contains 264 words which are part of whole article and file 588.txt contains 582 words which constitute full article.
Here are my steps:
1. I scanned 582 word content and here are results:
2. I scanned 264 word content and Copyscape found 2 URLs:
3. First URL in results was found to be containing 233 words matching 88% of that content:
Here is question i asked Copyscape staff:
Originally Posted by Me
Are you saying that if major part of my content is unique then Copyscape won't detect duplicate part? i.e. 30% duplicate content and 70% unique content? Does that mean that i must only scan content that i suspect to be duplicate?
Having run various tests, we noticed that the additional 318 words in the larger portion of text wasn't actually present in the URL you pointed out that was located when searching for the smaller portion of text. |
As our statistical search algorithm bases its overall search on certain unique phrases throughout the content, if half of the words don't appear on the website, the search essentially gets watered-down so the quality of the search becomes inferior and in certain rare instances will not turn up a result. This case seems to be one of these rare instances.
As we mentioned, this is a rare occurrence and when running a search on a portion of text that has 30% plagiarised content it would typically return a matching result. |
You're searches will certainly be more successful if the text you search has a higher percentage of plagiarised content.
So my advice is to scan content which you suspect to be duplicate, or not to scan more than 250 words of content in a single scan.
UPDATE: Here are online tools for detecting plagiarised content (some are good and some are not, but i sorted them from effective to least effective, however you should test each and find which works best):
DupeFree pro software is currently in its last beta testing stages soon to be released.
I also recommend scanning with multiple tools to be safe.