Register Advertise with usHelp Desk Today's Posts Search

Closed Thread
Thread Tools Search this Thread
Unread 20th May 2016, 06:59 AM   #1
Warrior Member
 
Join Date: 2016
Location: Paris, France
Posts: 4
Thanks: 0
Thanked 1 Time in 1 Post
Default
What are your criteria to stop an A/B Test?
Share on: 
fb share twitter share gplus share more share

Title: What are your criteria to stop an A/B Test?

Well met CRO Warriors!

Stopping A/B Tests too early is the most common and potent mistake made. And there are no cookie-cutter answer working for all statistical methods of A/B Testing.


So I’ve got one question and an article for you:
  1. What are YOUR criteria to stop an A/B Test?

  2. A short version of my article on the concepts you need to understand so you can make an informed decision as to when it’s safe to stop an A/B Test (link to the original, longer version at the bottom)

I mulled over this topic for quite a while. It wasn’t a good idea to tackle it from a statistical point of view, because—well, it’s complicated. Few people actually care whether their A/B testing tool is frequentist or bayesian and if you dig around a bit in the different softwares, they never use exactly the same statistical engine.

Plus I really didn’t like the idea of just giving numbers for you to blindly apply (or discard because they don’t feel right for you).

Instead I decided to try and explain the different concepts that would help you stop tests safely (and not get useless, most likely imaginary results).

We’ll cover the following:
  1. Significance level
  2. Sample size
  3. Duration
  4. Variability of data

Note: None of these elements are stopping rules on their own. But having a better grasp of them will allow you to make better decisions.


I. Significance level
When your A/B Testing tool tells you something along the line of: « your variation has 95% chance of outperforming/beating your control », it’s giving you the significance level.

But if you take things the other way around, it means: « There is 5% chance (1 in 20) that the result you see is completely random—a fluke.

You want at minimum 95%. Think about what it actually means. If you stop at 80%, there is 20% (1 in 5) that your result is a false positive.

You’re testing to make data-driven decisions, not slightly-better-than-flipping-a-coin ones.

BUT—having 95% significance level is NOT a sufficient condition to stop an A/B Test.


II. Sample size
Unless you’re testing a particular segment of visitors, make sure your sample is representative of your overall audience in composition and proportions. Be wary of unusual sources traffic that could be skewing your data.

Example: shooting your newsletter during your test thus having a spike of traffic with visitors more likely to receive positively any changes you make since they already trusted/appreciated you enough to subscribe.

Your sample must also be large enough so it’s not vulnerable to the natural variability of the data, i.e. if you don’t have enough measures, outliers results will have a strong impact on your overall results.


III. Duration
You should test for full weeks at a time. We recommend you test for 2-3 weeks, or 1 (or 2) business cycle.

Why? You already know that, for example, social networks and emails have optimal days (even hours) to shoot.

Meaning time and days influence people behaviours. Same thing with your conversion rates, if you’d do a conversion by day in Google Analytics, you’d see that mondays convert differently than thursdays for example.

Test for full weeks. 2 or 3 is good, or 1-2 business cycles so you can have people who just discovered you, some who already know you, etc ...


IV. Variability of data
If your significance level and/or the conversion rates of your variations are still fluctuating notably, let your test running.

Two phenomenons to consider here:
  • Regression to the mean: This is what we talked about earlier: the more you record data, the more you approach the “true value”. This why your tests fluctuate so much at first, you have few measures so outliers have a considerable impact.

  • The novelty effect: When people react to your change just because it’s new. It will fade with time.

This is also why the significance level isn’t enough on its own. During a test, you’ll most likely reach several times 95% before you can actually stop your test.

As we already mentioned, you’ll have these important fluctuations at the beginning of your tests because outliers will have an important impact on the overall conversion rate since you don’t have enough data to approach the « true » value.


To sum up, before stopping an A/B Test, consider the following:
  • Is your significance level equal or superior to 95%?
  • Is your sample large enough and representative of your overall audience in composition and proportions?
  • Have you run your test for the appropriate length of time?
  • Have your significance and conversion rate curves flattened out?


Only after taking all of those into account can you stop a test. Don’t skip them, don’t lose money …





Alright, back to you now!

PS: Tell me if the content was useful for you, and if not why + what would you have needed?





Last edited on 20th May 2016 at 08:02 AM. Reason: typo
Jeanbaptiste Alarcon is offline  
The Following User Says Thank You to Jeanbaptiste Alarcon For This Useful Post:
Unread 23rd May 2016, 03:47 AM   #2
viptraffictraining.com
 
Hearn's Avatar
 
Join Date: 2012
Location: Cyberspace
Posts: 359
Thanks: 8
Thanked 55 Times in 52 Posts
Default
Re: What are your criteria to stop an A/B Test?
Share on: 
fb share twitter share gplus share more share

Great approach. Will bookmark this for further research. Thanks.

Hearn is offline  
Unread 23rd May 2016, 04:29 AM   #3
Warrior Member
 
Join Date: 2016
Location: Paris, France
Posts: 4
Thanks: 0
Thanked 1 Time in 1 Post
Default
Re: What are your criteria to stop an A/B Test?
Share on: 
fb share twitter share gplus share more share

Thank you for the kind words, Hearn

(If you're interested by this topic, I wrote a 10,000 words ebook on A/B Testing mistakes that I can pm you for further research^^)
Jeanbaptiste Alarcon is offline  
Closed Thread


Bookmarks

Tags
a or b, ab testing, conversion optimization, criteria, cro, growth hacking, stop


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT -6. The time now is 04:08 PM.