Split Testing - Shouldn't it really be...

by Kurt
16 replies
I've never seen it posted, although I'm sure it's been mentioned before...You really need a control when doing split testing.

For example, most "experts" tell you to test A vs. B.

But for more accuracy, it should probably be:

A1 vs A2 vs B1 vs B2

In this case, A1 and A2 are exactly the same, and B1 and B2 are exactly the same.

So for example, if A1 comes in first place in testing, but B1 beats A2, then the testing is really incomplete. And if A1 and A2 have widely different results, this should be cause for further investigation, as they are both the same and "should" test virtually the same. If they do have very different results, IMO, it makes the test invalid.

Do any of you testers use controls? And should we be using controls?
#split
  • Profile picture of the author KristiDaniels
    That is why you calculate confidence intervals.
    {{ DiscussionBoard.errors[1510988].message }}
  • Profile picture of the author AverageGuy
    many so called split tests are statistically meaningless. to many IM guys who sell e products, the # of test sample is too small, and there are so many things that may have impact on the results, especially, the variance of online traffic is so much. the test results is meaningful only when your traffic is huge.


    david
    {{ DiscussionBoard.errors[1511031].message }}
  • Profile picture of the author Michael Kohler
    Split testing is not one of my strong points. It is easy to do on Adwords, but as for a landing page, you need a script to do that. Taguchi is supposed to rapidly test multiple variables and save you money, but if you are driving traffic to a page with free traffic (articles, SE's) then it might be best to just use a simple rotator.
    {{ DiscussionBoard.errors[1511240].message }}
    • Profile picture of the author Steven Wagenheim
      Kurt, Paul Hancox has a fascinating book on split testing. I have studied it
      extensively and while I do not claim to be an expert on it, I can honestly say
      that it has helped me a lot.

      I am still torn on the validity or lack of validity of split testing when you're
      only getting 200 visitors a day, but I do know this much. When I test vastly
      different sales letters (one I am sure will do well and the other I am sure
      sucks) I usually get vastly different results.

      My point is, as you get closer and close to what I call "nit picky testing"
      you reach a point of diminishing returns. This is just a personal preference
      of mine but as far as I am concerned, I couldn't care less if my sales page
      is converting at 2.2% or 2.7%. So what?

      To me, the difference is between a page I confidently feel is decent and
      converts around 2% versus a page I just throw together, almost purposely
      making it crap, and it doesn't make 1 sale.

      Eventually, you reach a point where you can put together a decent
      enough sales page that you have a pretty good idea of the "minimum"
      that it should convert provided (and this is the big if) you have done
      the proper research on your target market and know you have a product
      that they will want.

      I have a product line that is in such demand that I don't even have a sales
      page for it. It's literally a title and a buy now button. The presell and
      emails I send to my list and other prospects do the selling.

      I guess what I am trying to say is, I am not a big fan of split testing
      unless you have mega amounts of traffic (which I don't have) and are
      selling an item at a price high enough where the difference between 2.2%
      and 2.7% is significant.

      I am in no way saying my philosophy on this is the correct one. In fact, I
      will even concede that this is probably not going to give you the best
      return on your investment, but since I value my time more, and split
      testing properly takes a lot of time (I know, because I used to do it) I
      choose not to go it any longer. If I make $10,000 on a launch or $15,000
      so what? The money just isn't that important to me anymore.

      I'll go with my gut (based on past experience knowing what works for me)
      and leave it at that.

      Point is, what split testing I did never really gave me any conclusive
      results that I could confidently say "This sales letter is better than this
      one."

      And of course a good part of the reason could very well be that since I
      wrote both letters, and my skills as a copywriter are only so good (I'm no
      Makepeace of Fortin) my results are only going to be so good.

      Where you get vast differences is when you compare a sales letter that
      Joe Blow wrote to a sales letter that Clayton Makepeace wrote, and even
      then, it will still depend on each copywriter's experience in that particular
      niche.

      I don't know if I answered your question and if I did if it ended up helping
      or confusing you, but that's my personal opinion on split testing.

      If you think it's worth doing, that you can get valid results, do it.

      If not...don't do it.

      I am leaning towards the latter for my particular business model and skills
      as a copywriter.
      {{ DiscussionBoard.errors[1511318].message }}
  • Profile picture of the author Jay Jennings
    Originally Posted by Kurt View Post

    But for more accuracy, it should probably be:

    A1 vs A2 vs B1 vs B2

    In this case, A1 and A2 are exactly the same, and B1 and B2 are exactly the same.
    I think for a smaller amount of traffic that might make sense (although splitting less traffic among more pages isn't good), but over time and with significant traffic any weirdness should work itself out even with just two pages.

    Of course, some math geek will now come in and tell me why I'm wrong.

    Jay Jennings
    {{ DiscussionBoard.errors[1511280].message }}
  • Profile picture of the author Ben Roy
    The goal I think you're trying to get at is a good one (to eliminate 'chance' as the winning factor), but the reality is that doing this won't actually solve it. In some cases, over small sample sizes, A1 would beat A2 by a huge margin. This would tell you that you don't have enough data to draw conclusions yet. But in many cases A1 and A2 would match up and B1 and B2 would match up, but you would have an inconclusive test because there wasn't enough data.

    I guess my point is that doing this could help you rule out a few cases where you definitely do NOT have statistically meaningful data, but it wouldn't give you any confidence when the As and Bs were in sync.

    Much better just figuring out how many samples you need to have confidence in the results.
    {{ DiscussionBoard.errors[1511311].message }}
  • Profile picture of the author KristiDaniels
    Kurt,

    Your idea is actually very useful for teaching basic statistics.

    When you plug in impressions and actions for a and b into the formula and calculate the sigma and the confidence interval, the average person just doesn't get it.

    You can say that the CI sorta represents the chances that the winner really is the winner. For instance, a CI of 80% sorta means that there is an 80% chance that the current winner really is the winner.

    But it's hard to "see" that.

    If you do a test like you propose and show the confidence interval along with those results, you can have your student follow along and start to "see" what a confidence interval really means.

    As the test progresses, the student can see that the confidence interval increases at about the same rate that the 1 & 2 versions (which are identical) converge.

    They also get to see the winner and the loser switching places and get to see that it it insufficient data that is doing it. And they get to see that the switching places happens less and less often as the confidence interval increases.
    {{ DiscussionBoard.errors[1511329].message }}
    • Profile picture of the author Steven Wagenheim
      Originally Posted by KristiDaniels View Post

      Kurt,

      Your idea is actually very useful for teaching basic statistics.

      When you plug in impressions and actions for a and b into the formula and calculate the sigma and the confidence interval, the average person just doesn't get it.

      You can say that the CI sorta represents the chances that the winner really is the winner. For instance, a CI of 80% sorta means that there is an 80% chance that the current winner really is the winner.

      But it's hard to "see" that.

      If you do a test like you propose and show the confidence interval along with those results, you can have your student follow along and start to "see" what a confidence interval really means.

      As the test progresses, the student can see that the confidence interval increases at about the same rate that the 1 & 2 versions (which are identical) converge.

      They also get to see the winner and the loser switching places and get to see that it it insufficient data that is doing it. And they get to see that the switching places happens less and less often as the confidence interval increases.

      You lost me right after "Kurt"
      {{ DiscussionBoard.errors[1511350].message }}
      • Profile picture of the author cringwall
        My take on testing with low traffic sites is this: with a small sample size, I prefer simple A/B split testing against a control 'winner' over trying to tweak small page elements such as headlines, closes, etc.

        Unlike multivariate testing, where you are tweaking small portions of basically the same sales page, simple split testing doesn't assume you are 'dialed in' the the correct sales approach for the standpoints of:

        Overall Ad Appeal, i.e.:
        -Professional or 'Regular-Guy'
        -Benefits are presented matter of fact vs. emotional
        -Bonus-driven sale vs. product driven

        Ad Tone
        -Hard sell vs Conversational

        Ad Target Audience
        -Beginners or Experts

        Layout
        -Minimalist or busy
        -Placement of bonuses -

        Look and Feel
        -Bare bones vs. lots of graphics
        -Video clips vs. pictures

        If you know your customers well, some of this testing will not be required. But if you are entering a new market, with unknown customers, you'd be wise to use a split-test script to test these concepts.
        Signature

        Currently in research mode, any and all thoughtful replies are appreciated!

        {{ DiscussionBoard.errors[1511445].message }}
  • Profile picture of the author KristiDaniels
    NY1,

    On your "stray" point.

    I just noticed that. A few days ago, I asked fellow copywriters for a site review in the copywriting forum. I got a nice list of things to test. I put it all under test.

    How do you think they did?

    They were right at least 50% of the time; right? A monkey would be right 50% of the time in an A/B test just by chance.

    It turned out that only 1 out of 12 suggested changes survived the split test. They were worse than 50/50!

    Use proven swipe files. Use proven techniques from proven copywriters. And then let the market decide.

    Opinions are like...
    {{ DiscussionBoard.errors[1511392].message }}
  • Profile picture of the author KristiDaniels
    Steve,

    Sorry about that. The point was actually that when statistics if over your head, you can be aided by doing a test like Kurt proposes so you can "see" when the number are just jumping around randomly and when there is enough data to start paying attention to the numbers.

    That is what the confidence interval number in statistics is meant to be used for (or the sigma if you prefer). Seeing the randomness of the numbers at the beginning of a test with a CI of 50% and watching the numbers get more and more stable and match between version 1 and 2 as the CI increases toward 95% or so would be useful for people who don't get the standard statistics explanations.
    {{ DiscussionBoard.errors[1511405].message }}
    • Profile picture of the author Steven Wagenheim
      Originally Posted by KristiDaniels View Post

      Steve,

      Sorry about that. The point was actually that when statistics if over your head, you can be aided by doing a test like Kurt proposes so you can "see" when the number are just jumping around randomly and when there is enough data to start paying attention to the numbers.

      That is what the confidence interval number in statistics is meant to be used for (or the sigma if you prefer). Seeing the randomness of the numbers at the beginning of a test with a CI of 50% and watching the numbers get more and more stable and match between version 1 and 2 as the CI increases toward 95% or so would be useful for people who don't get the standard statistics explanations.
      You lost me after "Steve"
      {{ DiscussionBoard.errors[1511413].message }}
  • Profile picture of the author KristiDaniels
    Steve,

    Dude... it's like this ya know. If ya just do good things for good peeps, you'll be raking in the dough.
    {{ DiscussionBoard.errors[1511484].message }}
    • Profile picture of the author Steven Wagenheim
      Originally Posted by KristiDaniels View Post

      Steve,

      Dude... it's like this ya know. If ya just do good things for good peeps, you'll be raking in the dough.
      Now that I understood. Thanks.
      {{ DiscussionBoard.errors[1511517].message }}

Trending Topics