Is A/B Testing REALLY Valid?

10 replies
  • CRO
  • |
Before I go to far here, I will answer the thread title question.. the answer is YES it is valid. That being said. I recently have gone "Pro" as it were in the field of CRO. I have done much of this over the years on my own sites, and helped a few a little more than here and there in the last few years.

As I started doing this for clients I was many times left with scratching my head in disbelief. Here I am running a test A then B same traffic flow etc. and I find that say B is better. I then implement the change. Sounds good right? uh no...

I was finding that over a amount of time after the implementation of some tested changes there was no indication of better conversion or at least to the measured predicted change, and in some cases there was a LOSS, to the level that the A test indicated.

As many of you know, when you run into an issue with CRO its not like you can type "CRO" into a search engine and hope to find an answer... unless the answer includes listening to some European music group! hahaha

So I am retesting and retesting, and the results are jumping all over the place. At this point I'm at wits end.. so I decide to do the unthinkable and do a A/A test. and I kid you not, I start the test ( as in the first A ) I am looking at some article and click on this and that and end up on a Neil Patel article describing A/A testing. ( I was going to share the link, but I cant find the article hahaha ) all literally with in 15 minutes of starting the A/A test in desperation.

Basically the principle here is to validate the page, before you test the page. Ensuring the results from your test will in effect BE valid.

As many here are I am sure testing their own pages, like myself you look at any page you have and kinda know what the traffic across that page is. More specifically you know that Sunday is the lowest day and say Tuesday is its best day. And sure you can see this stuff in analytics, but its really not something you would look at or consider.. or at least I didn't. 1000 visitors across a page is 1000 visitors across a page right?

Since I have started A/A testing, I am finding that 1000 is not 1000 ALWAYS. and that there are other variables at play. I am finding the day specifically can effect the outcome, and in some cases I am finding the time of day to be a variable. Well let me rephrase that.. Im not finding.. I am guessing at this point that those are possible indicators. ( More testing required! hahaha )

I just found the article What Spending $252,000 On Conversion Rate Optimization Taught Me #2 specifically is where Neil gets into that. I forgot that he mentions that is due to bad software, but I think the principle of setting a benchmark regardless still applies.

Hope that Helps someone!

PS thanks Neil!!!!
#a or b #valid
  • Profile picture of the author quadagon
    I think there are two mistakes people make in CRO.

    Firstly not understanding statistics is a big factor I've seen people jump head first into change based on an increase from 10% to 11% thinking that its a 10% increase but not looking that its actually one more sale.

    On the flip side I've seen people not change from a 3% rate to a 4% rate because its only 1% higher. Try explaining that its a 33% difference and their eyes glaze over.

    Secondly people end up doing multi variable tests without realising that this is what they are doing.

    This ties in with the last mistake of trying to change too much. Small changes are great and you can keep monitoroing small changes as you go along.
    Signature
    I've got 99 problems but a niche ain't one
    {{ DiscussionBoard.errors[9843100].message }}
    • Profile picture of the author peekay
      @quadagon I think anybody who is even testing at all should be encouraged.
      In the majority cases I know, what goes on a page/website comes from the top - the HIPPO.

      One day, just CRO will take its rightful place.
      Signature
      Find out free how CRO can double or triple your conversions, sales & profits in the next 60 to 90 days...
      {{ DiscussionBoard.errors[9844114].message }}
  • Profile picture of the author Oziboomer
    Originally Posted by savidge4 View Post

    Since I have started A/A testing, I am finding that 1000 is not 1000 ALWAYS. and that there are other variables at play. I am finding the day specifically can effect the outcome, and in some cases I am finding the time of day to be a variable. Well let me rephrase that.. Im not finding.. I am guessing at this point that those are possible indicators. ( More testing required! hahaha )
    I run a few split tests and one test I ran VWO wanted to showcase as an example of a successful campaign.

    The variation beat the control by 1.84% and that might seem like a small improvement but it was an improvement that resulted in increased profits....

    ...now here's the thing...

    My bad...I forgot to stop running this test.

    I was busy doing other client's work and other things.

    It ran for about 5 months and the trend over the time was the difference between the winning variation and the control got narrower and narrower and when I noticed I thought "That's strange" I was going to change the control to the so called winning variable but over the time both were so close that "No Change was sexy"

    Repeated split tests on several different areas have resulted in significant immediate improvements for clients but this test always haunts me now as maybe "NEW" was the conversion factor and as clients saw the "SAME" over time the responsiveness dropped.

    Even though I'm committed to split testing I sort of think "NEW" beats "OLD" even if you end up back with "OLD" later.

    This paradox intrigues me and pushes me to keep split testing because maybe just changing it up is what keeps re-activating the past visitors and they add to the results you are getting from NEW traffic.
    {{ DiscussionBoard.errors[9845485].message }}
    • Profile picture of the author Insano
      Originally Posted by Oziboomer View Post


      Even though I'm committed to split testing I sort of think "NEW" beats "OLD" even if you end up back with "OLD" later.

      This paradox intrigues me and pushes me to keep split testing because maybe just changing it up is what keeps re-activating the past visitors and they add to the results you are getting from NEW traffic.
      I love this part, happens to me all the time when split testing Ad performance I recently went back to square 1 with minor changes from the last 6 months not on Ad Design or Position but Website colour scheme.... it worked wonders
      Signature
      Project Ara News and Community, every day fresh! Join now!
      {{ DiscussionBoard.errors[9846155].message }}
    • Profile picture of the author savidge4
      Originally Posted by Oziboomer View Post

      Even though I'm committed to split testing I sort of think "NEW" beats "OLD" even if you end up back with "OLD" later.

      Very much agree here. Structural changes I think make a lasting impact, and color variations and wording loose effect over time.

      I am currently focusing on e-commerce sites as my main ( only ) client target. Simple structure changes like removing coupon inputs ( if the client does not use coupons obviously ) and adding a progress bar bring lasting improvements. But you get into things like headlines and CTA applications and Ill take the current short win understanding that it wont last long.

      I look at the big boys like Walmart, and The Gap and Target and Tiger Direct, and they are CONSTANTLY changing elements and placements. With the level of consistency that these guys change elements I don't think it is because they have found "Better Results" per se, as they are switching to "NEW" then to "NEWER" and NEWEST" as the "OLD" wains in effective results.

      What I catch myself doing is changing and changing, as the results present itself, but sometimes you have to look back and see if the projected element gains are actually changing the bottom line conversions. With an ad campaign, these changes are apparently obvious. But when you are looking at SITE and its elements, many times positive testing results can end up with reverse effect conversions on the back end.

      A simple equation I have been using, is looking at the sum total of traffic of all the landing pages, and then comparing that number with the MOST accurate conversion result page.. the "Thank You" page. THIS for me is the real indicator of positive "change"
      Signature
      Success is an ACT not an idea
      {{ DiscussionBoard.errors[9846498].message }}
      • Profile picture of the author Raydal
        I don't mind doing basic split test for my sales letters and websites
        but when it comes to real testing I look to companies who specialize
        in that area and model their approaches. When you think about it,
        testing takes TIME and MONEY when done properly. And unless
        this is your particular field it's a lot quicker for the small guy to
        buy test results than find them on his own.

        As a simple test I'm presently finding that my longer video sales letters
        have a greater engagement level than shorter videos. Now I "know" this
        from test done by big companies and now I'm seeing this for myself.

        My shortest VSL is about 8 minutes and the longest over 30 minutes,
        yet the longest has more complete views than the shortest.

        -Ray Edwards
        Signature
        The most powerful and concentrated copywriting training online today bar none! Autoresponder Writing Email SECRETS
        {{ DiscussionBoard.errors[9861830].message }}
  • Profile picture of the author jlinowski
    I think many tests regress to the mean over time. It's important to run the test long enough to actually see if the effect is real or not.

    One thing you can do is introduce baselines for at least 100 conversion per variation + at least one month or so.

    The other thing to look at is the range of the effect.

    Example 1: This would be a very weak +100% effect with B overlapping quite a lot with the control.
    A/B Test (Split Test) Calculator | Thumbtack

    Example 2: This might be a lot stronger +100% effect:
    A/B Test (Split Test) Calculator | Thumbtack

    Finally, you should always gauge a sample size before you start the test. So knowing your traffic and current conversion rate, you might have a bit of a sense how much to run the test for in order to reach +x% effect before hand.

    Here is one such tool:
    https://vwo.com/ab-split-test-duration/

    Cheers,
    Jakub
    {{ DiscussionBoard.errors[9861760].message }}
  • Profile picture of the author dburk
    Originally Posted by savidge4 View Post

    BAs many here are I am sure testing their own pages, like myself you look at any page you have and kinda know what the traffic across that page is. More specifically you know that Sunday is the lowest day and say Tuesday is its best day. And sure you can see this stuff in analytics, but its really not something you would look at or consider.. or at least I didn't. 1000 visitors across a page is 1000 visitors across a page right?

    Since I have started A/A testing, I am finding that 1000 is not 1000 ALWAYS. and that there are other variables at play. I am finding the day specifically can effect the outcome, and in some cases I am finding the time of day to be a variable. Well let me rephrase that.. Im not finding.. I am guessing at this point that those are possible indicators. ( More testing required! hahaha )
    Hi savidge4,

    Based on your post it sounds like you were not conductiing A/B Split tests, that you were instead running serial tests. A/B Split tests will run the control and treatment over the same time period, randomly splitting the traffic between the control and the treatment. Serial testing does not "split" the traffic, but instead runs a control test for a period of time, then runs a second serial test of the treatment over an approximately equal amount of time.

    Serial testing is never going to be as accurate as split tests because the two separate test periods will always have uncontrolled variables. Split tests will typically yield statistically valid test results in about 1/8th the time it takes to get close to the same level of validity with a serial test.

    The bottom line is to always use split testing, where possible, for the most valid test results.
    {{ DiscussionBoard.errors[9871021].message }}
    • Profile picture of the author savidge4
      dburk,

      Interesting you caught that. yes in that moment I was sent over the edge LOL I was testing a woocommerce template type modification, and I simply do not know another way of testing that.

      Aside from that instance, I do run into the same kind of thing with A/B split testing with some amount of regularity. Since I have been running A/A pre-testing on a pretty consistent basis it is just crazy how often the testing comes back far from even. I ran a test 2 days ago that clearly indicated a 20% difference, ( on a A/A test ) and I ran the test far longer than I normally do, to see if it would "Correct" itself.. and it didn't. But then I ran the test again, and it was "normal"

      I want to think that in part it could be software, but I think that really I an easy answer out. I am starting to think that human nature is just wacked sometimes and that really is just what it is! LOL


      Originally Posted by dburk View Post

      Based on your post it sounds like you were not conductiing A/B Split tests, that you were instead running serial tests. A/B Split tests will run the control and treatment over the same time period, randomly splitting the traffic between the control and the treatment. Serial testing does not "split" the traffic, but instead runs a control test for a period of time, then runs a second serial test of the treatment over an approximately equal amount of time.

      Serial testing is never going to be as accurate as split tests because the two separate test periods will always have uncontrolled variables. Split tests will typically yield statistically valid test results in about 1/8th the time it takes to get close to the same level of validity with a serial test.

      The bottom line is to always use split testing, where possible, for the most valid test results.
      Signature
      Success is an ACT not an idea
      {{ DiscussionBoard.errors[9875266].message }}
  • Profile picture of the author universalyoga
    I read your post is really informative for me.
    {{ DiscussionBoard.errors[9919827].message }}

Trending Topics