![]() | | ||||||||
| | #1 |
| HyperActive Warrior War Room Member Join Date: Oct 2007 Location: , , .
Posts: 264
Thanks: 0
Thanked 8 Times in 7 Posts
|
Howdy All. I do A/B split testing and have for a along time...I know all about needing a big enough sample size of results to call a verdict on a test, but I have a dilemma. I have a test going on and the results are darn near identical, and the sample size is relativity large. What I can seem to squeeze out of my head is..."when do I call it a draw"? Meaning, when can I say with mathematical certainty (or close to it), that the test made no difference in sales? My issue is that if you were to flip a coin 10 times and get an even split, that isn't a large enough sample to call it 50/50 - but what is? If I just keep letting it run, it will get progressively CLOSER to 50%, just like as the sample size gets larger with the coin toss it should approach an even 50%. So when can I pull the plug? 50 results on each side? 100? 1000? Obviously the longer I run it the more sure I am, but at what point is it mathematically 95% or greater, etc? Much Thanks. |
| | |
| | #2 |
| Copywriter and Marketer War Room Member Join Date: Apr 2005 Location: Philly Suburbs, USA
Posts: 2,787
Thanks: 788
Thanked 697 Times in 373 Posts
|
Hi Rich, Got your PM so let me answer here. If you are running an A/B split test and the testing data is pretty much a draw after 50 or 100 actions, then you don't have a winner. I'd stick with the original control variation and find a new challenger to put up against it. Personally, I rarely do a split test of A vs. B... I prefer a multi-variate test with the control and two challenging variations. That way, it's much harder to get a 3-way tie between them. You *should* see one variation that outperforms the other 2 most of the time. As a general rule of thumb, aim for at least 100 actions before you deem a test statisically significant. I like to see at least an 85% confidence level with the higher the confidence level the better. 95% confidence level is ideal because it's considered to be 95% likely to keep the same exact test results and not have a change in the winner. Hope that helps, Mike |
| | |
| | |
![]() |
|
| Tags |
| a or b, call, draw, nature, question, statistical, testswhen |
| Thread Tools | |
| |
![]() |