The Death of Spintax: Spun Text Detection Algorithm

by 89 replies
109
I was at the pub with a fellow SEO who couldn't believe that Google can detect spun text. As a programmer, it's obvious they can. So this post is to settle the argument once and for all...

Although we can still get away with it for now, I am absolutely convinced there are very clever PhDs working on a spintax detection algorithm right now. I predict we have another year or two of getting away with spun content, before you'll have to replace it all with sentence-level super-spun content. The only reason we don't have it already is processing power, and I'm sure their brightest minds are already tuning the algorithm.


  1. Optional: Identify a short-list of candidates by scanning for posts with grammatical & writing style errors (like MS Word's grammar checker). You can skip this once you have enough processing power.
  2. Use LSI to group the posts into keyword topics posted around the same time.
  3. For each group:
    1. For each post:
      1. Use a tokenizing fuzzy matching algorithm to find any other posts in this group where >75% of the words are the same, in the same order (who spins every 4th word?)
      2. Optional: Confirm by checking whether the words that don't match are synonyms of each other.
  4. Mark all offending posts as belonging to the same blog network.
  5. Deindex the network.

So auto-spinners beware! The end is nigh...

This is my first post here, so if you found it helpful, please rate/thank.
#search engine optimization #algorithm #death #detection #spinner #spinning #spintax #spun #spun text #text
  • Your premise is accurate as to the future ability of Google to determine garbage content in general but the pattern of spun content has nothing to do in most cases with identifying particular networks. Only the uber lazy network owner (but unfortunately there are lots of those) will spin the same content across the entire network. So in reality only 1 and 2 need be looked at.
    • [ 1 ] Thanks
    • [1] reply
    • I think you've got a point, there Mike.
  • To my understanding google has been able to detect spun content for a lng time.

    IIRC back in the day one of the people selling spin software was publishing his case study (this is when google indented related or duplicate content) and he showed the serp and his stuff was indented.

    I am sorry I dont remember more details but I think they can already see spun content.
  • I hope spammers all fail.
    • [3] replies
    • Just like so many things in life that are deemed to be unnaceptable. If people want to do it and think it will get them somewhere they will find a way to do it. What needs to happen is for it to become more work than it's worth to spam rather than doing it the 'proper way'.

      I'm not sure spammers can ever fail completly as it is just an arms race.

      spam > detect > new type spam > detect > New type spam > detect etc...
      • [ 1 ] Thanks
    • I hope you fail. There's no way in hell you can teach anyone how to become a doctor. btw stop spamming the forums with your pathetic anchor texts.
    • Ive always found it humorous that its typically those most vociferously protesting "spammers" ... that have the spammy-est crap filled MFA sites around.

      One mans trash is another mans treasure I guess.
  • I think that if many of the blog sites can detect spun content that certainly big G can as well and I'm sure even better.
    • [1] reply
    • Holly, you mention blog sites that can detect spun content. Can you tell me more?
  • Something like this will 'take out' substantial amount of rewritten 'legitimate' content too. I see countless sitee with rewritten wikipedia content etc.
  • content that is spun on the word level is easy to detect, but when you spin using 5 or 6 sentences and use a paragraph structure that is also interchangeable, the article can look 100% unique many times over. Granted, this type of article can take hours and hours to make.

    I will still use high quality spins for layered linking.
    • [1] reply
    • I agree. Sentence+ level superspun text is totally the way to go. But it's gotta be written by a native English speaker.

      I'm now looking for a grammar checker that can take spintax as input, to guarantee that every path through the spintax at least makes basic grammatical sense.
  • The problem is also to know what is authoritative, who said this is the original article and that is the spun article?
    detecting in the future may be close but it will be if you will put 10 spun articles linking to your blog -> thats easy to know and understand.
    but what if there is a news site reporting the same thing written in different manner who is the original??
    • [1] reply
    • No, they won't ever be able to detect properly spun text.

      Here's two sentences:

      1. She owned a cat.
      2. Mary has always had a cat in her life.

      Do you really think Google will EVER be able to know that those are basically the same sentiment if I use them on different pages of my site?

      No.


      On the other hand, if you're stupid about spinning and you use these two sentences:

      1. She owned a cat.
      2. She owned a feline.

      Then sure, maybe someday Google will be able to look at that and say "SPUN!".

      So it will depend on how stupid you are in your spinning...if done correctly though, there is virtually no possible way for Google to detect spun content.
      • [ 4 ] Thanks
      • [4] replies
  • I believe that they would only look for spun content "matches" against content that is already indexed, not necessarily what is out there.. that would take way too much processing than is really needed for their goal. They want as close as they can get to unique content as possible in their index, of course. But say you posted 100 articles, all spun content, well, of course not all 100 articles are going to be indexed, that is the way it is today.. our goal is really the 'juice', not the indexing anyways. Sure you could try and narrow things down to a network, but really, they dont want to deindex 'your' network, they want to deindex the whole network.. kind of like how cops dont want to bust the users, they want to bust the pushers. Why doesn't anybody think that they could detect spun content before? I mean, everybodys been saying to spin at x% or above uniqueness, at both sentence level and word level, etc, well, that's always been why anyways
  • There is an alternative- and it's darn cheap. It's seriously cheap and you can reuse all of your PLR at the same time.
  • If you're using WordAI you'll never have to worry about poor quality spintax again. It's the most advanced natural language processing product on the market.

    It's still in private beta right now so it's not available to the general public. Hit up Cardine by sending him a PM and he'll set you up. He developed the tool and is an expert in this field.

    As a Word AI user, it freaking rocks! I've been using it for the past few months and it's been a game changer for me. Love it! : )
  • Unless your writing an original post from a study or research then everything else is really just spun content. Even if you are reading an article for research and rewrite by hand with your own character thoughts and idea you still could have a very similar article.

    I guess Google will just use social and other signals to see if the content is quality and then give the spun content penalty, or whatever they are going to do.

    But the longer and longer I do SEO the more and more a try to do white hat as it really is the only long term strategy. But black/gray hat is fun while it lasts, and of course easy! =)
  • Thanks Nicky. I've heard about a similar product (very much in alpha) that generates articles from scratch, doing the research and all, and creates totally readable articles. This is definitely the way forward. Being new here, I can't PM Cardine yet, but I'm definitely interested.

    Die, you auto-spinning tools, die!
    • [1] reply
    • Here is some example spintax that WordAi is automatically capable of:
      That type of spinning is not really detectable at all by Google, and is readable enough to pass a manual review. It obviously isn't able to do nested spintax every sentence, but I am working on improving that everyday (and WordAi currently very easily passes Copyscape on 99%+ of the spins it does).

      My eventual goal (4-5 months away) is to be able to generate completely unique content from scratch. I haven't quite achieved that yet, but I'm getting closer, and creating a spinner that can generate 100% readable text is the first step for me. I'll be very sure you know all about that when the time comes

      The signup link is here (it's still in beta, but it's getting closer to launch so I don't mind sharing the link publicly).

      Also if you still can't PM me but still have any other questions, feel free to skype me (cardine18) or email me (alex@wordai.com).



      I would not blast out the same article spun 100's of times unless you have very very high quality spintax. There is always a tradeoff between readability and uniqueness, and even with the spintax example I gave above, you can probably use that no more than ~10 times before similarities start to leak out. It is far better to take 100 articles, spin them and use each spintax variation 2-3 times than it is to take one article, spin it and use that spintax 200 times.
      • [1] reply
  • Banned
    [DELETED]
  • Article + Spintax + Lazy Author = detectable
    Article + Spintax + Fast Author = some detectable
    Article + Spintax + Real Author = few detectable

    Lazy Author uses spintax as the author.
    Fast Author is spintax's assistant.
    Real Author uses spintax as just a low-end assistant.
    • [1] reply
    • What does that suppose to mean?
  • Wow, this thread really took off. Thanks guys!
  • Smartly spun content (i.e spun at all levels: word, phrase, sentence and paragraph level) will never be detected simply because it's indistinguishable from manual rewrite - and who in their right mind will dear even to think of punishing rewriters???

    The algorithm you descibe is only applicable to the most primitive 1D spinning at word level but even in this case it's highly error-prone.
    • [1] reply
    • Banned
      Depending on your 800% uniqueness blablabla that those spin tools indicate.

      I know someone who does article spinning on paragraph/sentence and phrase/word level spun, it only took me 20 articles to compare to find out that 's spun, despite him using paragraph rotation.

      He spun each paragraph 3 times, each sentence 3 times and phrases/words throughout it all. Let's say you do 6 paragraph, 6 sentence etc. it will probably take me 50 or 100 articles to figure it out. A computer algorithm would be way more effective then me so quit living in your fantasy world.
  • GOOGLE?

    Of course, it doesn't matter when you are writing beautifully for your audience.

    Any updates from inside the big goog on what spun crap can be identified? This is a year after the OP put down his tracks. Having never used any of it, I'm feeling hunky-dory.
    • [1] reply
    • I spin content all the time. You have to. Why reinvent the wheel?
      It's how you spin it, I presume. I cannot fathom anyone actually
      using any auto-spin garbage and expect to get anywhere.

      Yup, year old thread, dug up. Notice nothing really changed in
      the past year?

      People come up with ideas, those ideas go away. Then they
      come back with new ideas, or try and prop old ones up.

      If you think google cares about good, spun content, and penalizing
      it, take a look at wikipedia....the king of spun, copied, and sometimes
      garbage, content.

      It's not spun, copied, or garbage content. It's how you present it,
      among other things.

      We'll see if anyone below me actually reads the whole thread, and realizes
      how old it is without blindly posting some garbage.

      ROTFLMAO! Garbage content? Has that hurt the WF? Not one bit!

      Paul
      • [1] reply
  • Your money sites should always have unique content anyway.

    I only use spun content for linking with SENuke and have never been penalized.
    • [2] replies
    • I think that if you create spun content for your Tier 1 network, you expose your money site to a penalty.

      Perhaps, the penalty will not occur now because Google has not implemented the algorithm yet but Google always update their algorithm and one day some websites will be hit.

      If you create your spun articles now. What do we know about spun articles detection algorithm that will arrive in 5 years time.

      Do you remember where the internet was 5 years ago? see the evolution...

      So, the strategy you follow now will have consequences for the future.
    • that is exactly what i do
  • Original contents are the key of Blogging
  • Banned
    @OP: I totally agree and the way you outlined makes it look dead easy to detect spun content, so why wouldn't Google take action? You provided the answer already, processing power.

    @Others who say that Google has no reason to go after spun content, OF COURSE it has a good reason to go after spun content which is that it provides zero value to the internet so why would Google keep those pages indexed? It won't.

    Besides, spun content is used for artificial link building, if Google wasn't against that they would've never bothered with updates like Penguin.

    For as far as I know I'm the only one on this whole forum (that advertises in the for sales sections here I have to add) that quit using spun content over a year ago already, and everyone thought I was stupid to waste money on expensive content for link building purposes. After all, my service would be less effective cause I would be able to build less links. Well so be it, many take that for granted. I can't way till the day that spun content gets targeted, so much new business would come my way!

    It's even that bad that when clients ask me to recommend someone else on this forum (cause they are in a tough niche for example and need extra back link power at a low price) that I can't recommend anyone as every single provider is doing something against my likings. Even the most popular ones (no wonder they are so popular when they can deliver 50 or 100 links for the same price where they only get a dozen links from me).
  • Microsoft has a grammar detector in Word.
    ...why is it so difficult to believe that Google does not use something similar to detect bad grammar --> easily detect crap spun content.
  • You know what's funny though???

    No matter how many changes or different things Google has thrown at us, we always seem to have an answer for it.

    Those people working at Google are very sharp. There are some very sharp individuals in IM as well.

    Lets see how many products are released to get around this lol.

    Salute to the IM genii.
    • [1] reply
  • The Good News: Still doesn't detect spintax in Videos (Google doesn't
    understand the content of Videos)

    Video Marketing = Still Great To Get Google Ranks.
  • We could debate on and on whether they can detect spun content or not.. the fact remains, there are plenty of obviously spun pages still ranking in Google. If they programmed part of the algo to deindex spun pages, as of yet it's barely even touched the majority. As far as risk goes, my opinion is, if you're gonna do it, do it well or not at all.
  • It's all about money. Google is a business.

    There are better approaches for identifying networks, and content analysis on that level isn't one of those ways.

    Google only needs to get the algorithm just right, beyond that there's no need to do massive changes. As long as Adwords is the prominent result, the rest only matters as long as people continue to use Google.
  • They can write algorythms that produce symphonies that people think are masterpieces until they found out a computer wrote it. It's only a matter of time before there's something that can produce passable articles.
  • I think some are also missing this point.

    Just as detection of spun has moved forward. As has automatic creation of content. I have had the privilege of seeing whats coming down the line in terms of assisted content creation. And its mind blowing shit that makes spinning almost laughable.

    So don't sweat just yet.
  • Hey Kevin, as a true lover of English, I wouldn't mind seeing this "mind blowing" content creation stuff for myself.

    I'm yet to find any auto-generated content that is truly readable!
  • Spintax is not dead, it's just a lot more difficult. There are tools out there that can spin well. Have you heard of WordAi? It works pretty good for me. Not an affiliate at all, just someone who has used it.
  • Banned
    [DELETED]
  • content spinners are thieves... plain and simple. You flood the internet with stolen material that provides no value to your visitors... and at best the content is barely even readable...all just to steal scraps from real content producers.
    • [1] reply
    • I'm not sure you understand what spinning is. You're thinking content scrapers. Though sometimes the two techniques are combined.
      • [1] reply
  • Here are my thoughts.

    1) I have been using Wordai Turing mode to produce largely readable highly spun articles.
    2) I have been using them on mini sites, i get the content indexed and ranking fine with a few manual edits.
    3) This allows me to churn out 30 or 50 articles a day if i wish, which is a lot of content and obviously has value.


    Now a few more observations:

    1) Spun (undedited) content on web 2.0 sites is getting harder to index. For whatever reason, the requirements in order to get the content indexed are now way higher.

    2) Several sites where i predominantly received links from spun content (relevant web 2.0 sites, readable spun content) were penalized with the last penguin update.

    Now going forward i cannot, with a clear conscious, reccomend newbies create spun content for the purpose of link building. In fact, tools like SEnuke now may actually be very counter productive.

    In the long term, unless you are editing the content to make sure its unique, readable and dont put 1000 different variations of the spintax up, you are ****ed. Mass spinning stuff, posting on thousands of properties is not a productive tactic anymore and is most definitely on the way out.

    Now i am not saying people cant rank or wont rank over the medium term using such tactics. Some will be able to. But long term, this tactic is doomed, so be warned and start changing your best practices now.
    • [1] reply
    • As Google gets better at detecting spun content, spinners will get better at spinning. I know that from what I have seen in the past, as well as what I know is being developed for the future. It won't be long before there will be no differences at all in quality between spun versus not spun content.
      • [1] reply
  • Banned
    Lmao, spinners (fail).
  • Ok, this conversation is amusing.

    Do you all want to know a little secret?

    Go out and pick a long tail keyword to rank. Now go copy and paste a paragraph of content from at least 4 different sources on the web. Put those 4 paragraphs of content on your page. That content will pass through every panda update no problem. And will rank just fine.

    For the past 2 years i've been a black hat autoblogger. No, i don't monitize the autoblog. i use a cloaking redirection plugin to keep google bot on the autoblog but automatically redirect the visitors to a different quality website. (By the way, putting affiliate links on a huge autoblog just gets it penalized right away. Mine are not monitized).

    I can build thousands of pages with this type of content. And they survive and stay ranked through every panda update. I don't build backlinks to them so they survive Penguin updates too. They can send 100-300 visitors per day.

    Do i use spun content? Nope.

    Each page on my autoblog has anywhere from 8-20 snippets of content from 8-20 different sources on the web.

    Now if you just copy and paste an ezinearticle onto your site will it rank? Nope. But if your page content is from multiple sources on the web it will rank just fine (no penalties).

    Don't believe it? Go ahead and try it. You'll be surprised
    • [ 1 ] Thanks
    • [1] reply
    • Banned
      Although I don't practice these type of things I still have a certain love for it but too busy to figure such things out.

      Any recommended tools and is that plugin for sale somewhere?
  • When I was searching last year I found plenty of spun content ranking well. So as of last year Google certainly could detect spun content. Either that or they chose not to penalise it too much.

    I guess that it's actually quite hard to detect, as poor English could just mean the writer is a non-English native speaker.
  • Banned
    Sometimes I think they do it step by step to give people the chance to improve.

    Otherwise half the internet would tank in a day
  • Banned
    [DELETED]
  • They can't detect spun text. Plain and simple. The real question is, can they detect, or do they score grammar and readability? Yes they can and do. They want the most readable and gramatically correct results in their search engine. That's a no brainer, right? That said, being that a lot if not most 'spun' text is full of grammar errors and practically unreadable, you could say they can detect 'spun' content. That's circumstantial, though. You can write a perfectly readable and perfect grammar spun article. That would pass the grammar and readability score with flying aces.

    It's like asking, "Are police good at catching fast cars"? What's a fast car? My car goes pretty fast. A lamborghini can go faster than mine. 55mph is pretty fast, but not illegal on the highway. It's not the speed ability of the car that's in question. The question is, "Are police good at catching speeding cars"?
    • [ 1 ] Thanks
  • [DELETED]
  • Read various news stories on the same subject and you will understand what the difference is spinning and spamming.

    Problem is, a spintax file that creates content that can be seen as stories that could be posted to various sites, and are completely unique, takes a long time to create and generates a 10MB+ text file. I've seen, literally, a 3 million line text file. Took the writer a few months, and they're still making money from it. Not my style, but, geeze.
    • [1] reply
    • Actually it does not take that long to create using an editor designed for that purpose. It is much easier than writing new, unique content. For each paragraph, create one or more variants with the same point as the original. Vary the number of sentences in each so spun documents will appear structurally different. Break the paragraphs into sentences. For each sentence, create several variations that express the same meaning. This is fairly mindless gruntwork if you have a decent facility with writing. You just put yourself in a chair and grind through the sentences. It does not take that long once you get the hang of it.

      In many cases documents discuss several points but the order of the points is not important. This may occur at the paragraph or sentence level. For more variation find those cases and mark them to be randomly reordered instead of choosing one at random.

      As a final step convert many words to spintax that maintains proper grammar.
  • Google has no problem detecting poorly spun content. Google will also hammer any rankings for your site that may have resulted from content they deem to have been spun.

    I'm on board with DanParks and MarketingFool. It's entirely possible to create spun content that is undetectable. It's more work, of course, but worth it.

    Anyone thinking of populating a money-site with spun content, no matter how "good", needs to give their head a shake!

    Good luck!
    • [1] reply
    • I agree with your first point but challenge your second. There's absolutely no way Google would ever detect the high-quality spintax we create by hand. Just not going to happen, ever. There would be too many false positives and too much collateral damage if they even attempted to do that. You just need to not be lazy and hire smart people to do this.
      • [1] reply
  • Every thing is a spun of the main source anyways, look at the news.
  • I don't like spun content and I don't use it in any of my work. For me, it's just easier to come up with original content. The human brain is the best content creator of all. And, no software is going to come up with better diversity of content as what a human can do. Even if others are spinning content, I'll just stick to my own version of things.

Next Topics on Trending Feed

  • 109

    I was at the pub with a fellow SEO who couldn't believe that Google can detect spun text. As a programmer, it's obvious they can. So this post is to settle the argument once and for all... Although we can still get away with it for now, I am absolutely convinced there are very clever PhDs working on a spintax detection algorithm right now. I predict we have another year or two of getting away with spun content, before you'll have to replace it all with sentence-level super-spun content. The only reason we don't have it already is processing power, and I'm sure their brightest minds are already tuning the algorithm.