Is A/B Testing REALLY Valid?

savidge4 · 28th Jan 2015, 01:27 AM

Before I go to far here, I will answer the thread title question.. the answer is YES it is valid. That being said. I recently have gone "Pro" as it were in the field of CRO. I have done much of this over the years on my own sites, and helped a few a little more than here and there in the last few years.

As I started doing this for clients I was many times left with scratching my head in disbelief. Here I am running a test A then B same traffic flow etc. and I find that say B is better. I then implement the change. Sounds good right? uh no...

I was finding that over a amount of time after the implementation of some tested changes there was no indication of better conversion or at least to the measured predicted change, and in some cases there was a LOSS, to the level that the A test indicated.

As many of you know, when you run into an issue with CRO its not like you can type "CRO" into a search engine and hope to find an answer... unless the answer includes listening to some European music group! hahaha

So I am retesting and retesting, and the results are jumping all over the place. At this point I'm at wits end.. so I decide to do the unthinkable and do a A/A test. and I kid you not, I start the test ( as in the first A ) I am looking at some article and click on this and that and end up on a Neil Patel article describing A/A testing. ( I was going to share the link, but I cant find the article hahaha ) all literally with in 15 minutes of starting the A/A test in desperation.

Basically the principle here is to validate the page, before you test the page. Ensuring the results from your test will in effect BE valid.

As many here are I am sure testing their own pages, like myself you look at any page you have and kinda know what the traffic across that page is. More specifically you know that Sunday is the lowest day and say Tuesday is its best day. And sure you can see this stuff in analytics, but its really not something you would look at or consider.. or at least I didn't. 1000 visitors across a page is 1000 visitors across a page right?

Since I have started A/A testing, I am finding that 1000 is not 1000 ALWAYS. and that there are other variables at play. I am finding the day specifically can effect the outcome, and in some cases I am finding the time of day to be a variable. Well let me rephrase that.. Im not finding.. I am guessing at this point that those are possible indicators. ( More testing required! hahaha )

I just found the article What Spending $252,000 On Conversion Rate Optimization Taught Me #2 specifically is where Neil gets into that. I forgot that he mentions that is due to bad software, but I think the principle of setting a benchmark regardless still applies.

Hope that Helps someone!

PS thanks Neil!!!!

quadagon · 28th Jan 2015, 09:13 AM

I think there are two mistakes people make in CRO.

Firstly not understanding statistics is a big factor I've seen people jump head first into change based on an increase from 10% to 11% thinking that its a 10% increase but not looking that its actually one more sale.

On the flip side I've seen people not change from a 3% rate to a 4% rate because its only 1% higher. Try explaining that its a 33% difference and their eyes glaze over.

Secondly people end up doing multi variable tests without realising that this is what they are doing.

This ties in with the last mistake of trying to change too much. Small changes are great and you can keep monitoroing small changes as you go along.

peekay · 28th Jan 2015, 04:34 PM

@quadagon I think anybody who is even testing at all should be encouraged.
In the majority cases I know, what goes on a page/website comes from the top - the HIPPO.

One day, just CRO will take its rightful place.

Oziboomer · 29th Jan 2015, 07:30 AM

Originally Posted by savidge4

Since I have started A/A testing, I am finding that 1000 is not 1000 ALWAYS. and that there are other variables at play. I am finding the day specifically can effect the outcome, and in some cases I am finding the time of day to be a variable. Well let me rephrase that.. Im not finding.. I am guessing at this point that those are possible indicators. ( More testing required! hahaha )

I run a few split tests and one test I ran VWO wanted to showcase as an example of a successful campaign.

The variation beat the control by 1.84% and that might seem like a small improvement but it was an improvement that resulted in increased profits....

...now here's the thing...

My bad...I forgot to stop running this test.

I was busy doing other client's work and other things.

It ran for about 5 months and the trend over the time was the difference between the winning variation and the control got narrower and narrower and when I noticed I thought "That's strange" I was going to change the control to the so called winning variable but over the time both were so close that "No Change was sexy"

Repeated split tests on several different areas have resulted in significant immediate improvements for clients but this test always haunts me now as maybe "NEW" was the conversion factor and as clients saw the "SAME" over time the responsiveness dropped.

Even though I'm committed to split testing I sort of think "NEW" beats "OLD" even if you end up back with "OLD" later.

This paradox intrigues me and pushes me to keep split testing because maybe just changing it up is what keeps re-activating the past visitors and they add to the results you are getting from NEW traffic.

Insano · 29th Jan 2015, 01:17 PM

Originally Posted by Oziboomer

Even though I'm committed to split testing I sort of think "NEW" beats "OLD" even if you end up back with "OLD" later.

This paradox intrigues me and pushes me to keep split testing because maybe just changing it up is what keeps re-activating the past visitors and they add to the results you are getting from NEW traffic.

I love this part, happens to me all the time when split testing Ad performance

I recently went back to square 1 with minor changes from the last 6 months not on Ad Design or Position but Website colour scheme.... it worked wonders

savidge4 · 29th Jan 2015, 03:55 PM

Originally Posted by Oziboomer

Even though I'm committed to split testing I sort of think "NEW" beats "OLD" even if you end up back with "OLD" later.

Very much agree here. Structural changes I think make a lasting impact, and color variations and wording loose effect over time.

I am currently focusing on e-commerce sites as my main ( only ) client target. Simple structure changes like removing coupon inputs ( if the client does not use coupons obviously ) and adding a progress bar bring lasting improvements. But you get into things like headlines and CTA applications and Ill take the current short win understanding that it wont last long.

I look at the big boys like Walmart, and The Gap and Target and Tiger Direct, and they are CONSTANTLY changing elements and placements. With the level of consistency that these guys change elements I don't think it is because they have found "Better Results" per se, as they are switching to "NEW" then to "NEWER" and NEWEST" as the "OLD" wains in effective results.

What I catch myself doing is changing and changing, as the results present itself, but sometimes you have to look back and see if the projected element gains are actually changing the bottom line conversions. With an ad campaign, these changes are apparently obvious. But when you are looking at SITE and its elements, many times positive testing results can end up with reverse effect conversions on the back end.

A simple equation I have been using, is looking at the sum total of traffic of all the landing pages, and then comparing that number with the MOST accurate conversion result page.. the "Thank You" page. THIS for me is the real indicator of positive "change"

jlinowski · 5th Feb 2015, 10:55 AM

I think many tests regress to the mean over time. It's important to run the test long enough to actually see if the effect is real or not.

One thing you can do is introduce baselines for at least 100 conversion per variation + at least one month or so.

The other thing to look at is the range of the effect.

Example 1: This would be a very weak +100% effect with B overlapping quite a lot with the control.
A/B Test (Split Test) Calculator | Thumbtack

Example 2: This might be a lot stronger +100% effect:
A/B Test (Split Test) Calculator | Thumbtack

Finally, you should always gauge a sample size before you start the test. So knowing your traffic and current conversion rate, you might have a bit of a sense how much to run the test for in order to reach +x% effect before hand.

Here is one such tool:
https://vwo.com/ab-split-test-duration/

Cheers,
Jakub

Raydal · 5th Feb 2015, 11:28 AM

I don't mind doing basic split test for my sales letters and websites
but when it comes to real testing I look to companies who specialize
in that area and model their approaches. When you think about it,
testing takes TIME and MONEY when done properly. And unless
this is your particular field it's a lot quicker for the small guy to
buy test results than find them on his own.

As a simple test I'm presently finding that my longer video sales letters
have a greater engagement level than shorter videos. Now I "know" this
from test done by big companies and now I'm seeing this for myself.

My shortest VSL is about 8 minutes and the longest over 30 minutes,
yet the longest has more complete views than the shortest.

-Ray Edwards

dburk · 9th Feb 2015, 06:15 PM

Originally Posted by savidge4

BAs many here are I am sure testing their own pages, like myself you look at any page you have and kinda know what the traffic across that page is. More specifically you know that Sunday is the lowest day and say Tuesday is its best day. And sure you can see this stuff in analytics, but its really not something you would look at or consider.. or at least I didn't. 1000 visitors across a page is 1000 visitors across a page right?

Since I have started A/A testing, I am finding that 1000 is not 1000 ALWAYS. and that there are other variables at play. I am finding the day specifically can effect the outcome, and in some cases I am finding the time of day to be a variable. Well let me rephrase that.. Im not finding.. I am guessing at this point that those are possible indicators. ( More testing required! hahaha )

Hi savidge4,

Based on your post it sounds like you were not conductiing A/B Split tests, that you were instead running serial tests. A/B Split tests will run the control and treatment over the same time period, randomly splitting the traffic between the control and the treatment. Serial testing does not "split" the traffic, but instead runs a control test for a period of time, then runs a second serial test of the treatment over an approximately equal amount of time.

Serial testing is never going to be as accurate as split tests because the two separate test periods will always have uncontrolled variables. Split tests will typically yield statistically valid test results in about 1/8th the time it takes to get close to the same level of validity with a serial test.

The bottom line is to always use split testing, where possible, for the most valid test results.

savidge4 · 11th Feb 2015, 08:49 AM

dburk,

Interesting you caught that. yes in that moment I was sent over the edge LOL I was testing a woocommerce template type modification, and I simply do not know another way of testing that.

Aside from that instance, I do run into the same kind of thing with A/B split testing with some amount of regularity. Since I have been running A/A pre-testing on a pretty consistent basis it is just crazy how often the testing comes back far from even. I ran a test 2 days ago that clearly indicated a 20% difference, ( on a A/A test ) and I ran the test far longer than I normally do, to see if it would "Correct" itself.. and it didn't. But then I ran the test again, and it was "normal"

I want to think that in part it could be software, but I think that really I an easy answer out. I am starting to think that human nature is just wacked sometimes and that really is just what it is! LOL

Originally Posted by dburk

Based on your post it sounds like you were not conductiing A/B Split tests, that you were instead running serial tests. A/B Split tests will run the control and treatment over the same time period, randomly splitting the traffic between the control and the treatment. Serial testing does not "split" the traffic, but instead runs a control test for a period of time, then runs a second serial test of the treatment over an approximately equal amount of time.

Serial testing is never going to be as accurate as split tests because the two separate test periods will always have uncontrolled variables. Split tests will typically yield statistically valid test results in about 1/8th the time it takes to get close to the same level of validity with a serial test.

The bottom line is to always use split testing, where possible, for the most valid test results.

universalyoga · 4th Mar 2015, 01:29 AM

I read your post is really informative for me.

28th Jan 2015, 01:27 AM	#1
savidge4 Midnight Oil Warrior Join Date: 2013 Location: Bridgeport, WV. Posts: 6,791 Thanks: 3,612 Thanked 6,621 Times in 3,497 Posts	Is A/B Testing REALLY Valid? Share on: Before I go to far here, I will answer the thread title question.. the answer is YES it is valid. That being said. I recently have gone "Pro" as it were in the field of CRO. I have done much of this over the years on my own sites, and helped a few a little more than here and there in the last few years. As I started doing this for clients I was many times left with scratching my head in disbelief. Here I am running a test A then B same traffic flow etc. and I find that say B is better. I then implement the change. Sounds good right? uh no... I was finding that over a amount of time after the implementation of some tested changes there was no indication of better conversion or at least to the measured predicted change, and in some cases there was a LOSS, to the level that the A test indicated. As many of you know, when you run into an issue with CRO its not like you can type "CRO" into a search engine and hope to find an answer... unless the answer includes listening to some European music group! hahaha So I am retesting and retesting, and the results are jumping all over the place. At this point I'm at wits end.. so I decide to do the unthinkable and do a A/A test. and I kid you not, I start the test ( as in the first A ) I am looking at some article and click on this and that and end up on a Neil Patel article describing A/A testing. ( I was going to share the link, but I cant find the article hahaha ) all literally with in 15 minutes of starting the A/A test in desperation. Basically the principle here is to validate the page, before you test the page. Ensuring the results from your test will in effect BE valid. As many here are I am sure testing their own pages, like myself you look at any page you have and kinda know what the traffic across that page is. More specifically you know that Sunday is the lowest day and say Tuesday is its best day. And sure you can see this stuff in analytics, but its really not something you would look at or consider.. or at least I didn't. 1000 visitors across a page is 1000 visitors across a page right? Since I have started A/A testing, I am finding that 1000 is not 1000 ALWAYS. and that there are other variables at play. I am finding the day specifically can effect the outcome, and in some cases I am finding the time of day to be a variable. Well let me rephrase that.. Im not finding.. I am guessing at this point that those are possible indicators. ( More testing required! hahaha ) I just found the article What Spending $252,000 On Conversion Rate Optimization Taught Me #2 specifically is where Neil gets into that. I forgot that he mentions that is due to bad software, but I think the principle of setting a benchmark regardless still applies. Hope that Helps someone! PS thanks Neil!!!!
	Success is an ACT not an idea

28th Jan 2015, 04:34 PM	#3
peekay HyperActive Warrior Join Date: 2010 Posts: 106 Thanks: 0 Thanked 31 Times in 19 Posts	Re: Is A/B Testing REALLY Valid? Share on: @quadagon I think anybody who is even testing at all should be encouraged. In the majority cases I know, what goes on a page/website comes from the top - the HIPPO. One day, just CRO will take its rightful place.
	Find out free how CRO can double or triple your conversions, sales & profits in the next 60 to 90 days...

5th Feb 2015, 10:55 AM	#7
jlinowski Warrior Member War Room Member Join Date: 2014 Location: Toronto, Canada Posts: 12 Thanks: 0 Thanked 9 Times in 3 Posts	Re: Is A/B Testing REALLY Valid? Share on: I think many tests regress to the mean over time. It's important to run the test long enough to actually see if the effect is real or not. One thing you can do is introduce baselines for at least 100 conversion per variation + at least one month or so. The other thing to look at is the range of the effect. Example 1: This would be a very weak +100% effect with B overlapping quite a lot with the control. A/B Test (Split Test) Calculator \| Thumbtack Example 2: This might be a lot stronger +100% effect: A/B Test (Split Test) Calculator \| Thumbtack Finally, you should always gauge a sample size before you start the test. So knowing your traffic and current conversion rate, you might have a bit of a sense how much to run the test for in order to reach +x% effect before hand. Here is one such tool: https://vwo.com/ab-split-test-duration/ Cheers, Jakub

5th Feb 2015, 11:28 AM	#8
Raydal Marketing Strategist War Room Member Join Date: 2003 Location: USVI, USA. Posts: 4,468 Thanks: 758 Thanked 3,335 Times in 1,626 Posts	Re: Is A/B Testing REALLY Valid? Share on: I don't mind doing basic split test for my sales letters and websites but when it comes to real testing I look to companies who specialize in that area and model their approaches. When you think about it, testing takes TIME and MONEY when done properly. And unless this is your particular field it's a lot quicker for the small guy to buy test results than find them on his own. As a simple test I'm presently finding that my longer video sales letters have a greater engagement level than shorter videos. Now I "know" this from test done by big companies and now I'm seeing this for myself. My shortest VSL is about 8 minutes and the longest over 30 minutes, yet the longest has more complete views than the shortest. -Ray Edwards
	The most powerful and concentrated copywriting training online today bar none! Autoresponder Writing Email SECRETS [SIGPIC][/SIGPIC]

4th Mar 2015, 01:29 AM	#11
universalyoga Warrior Member Join Date: 2012 Posts: 24 Thanks: 0 Thanked 5 Times in 5 Posts	Re: Is A/B Testing REALLY Valid? Share on: I read your post is really informative for me.
	Medical Billing NYC \| Revenue Cycle Management Company USA

28th Jan 2015, 09:13 AM	#2
quadagon Advanced Warrior Join Date: 2014 Posts: 810 Thanks: 152 Thanked 572 Times in 340 Posts	Re: Is A/B Testing REALLY Valid? Share on: I think there are two mistakes people make in CRO. Firstly not understanding statistics is a big factor I've seen people jump head first into change based on an increase from 10% to 11% thinking that its a 10% increase but not looking that its actually one more sale. On the flip side I've seen people not change from a 3% rate to a 4% rate because its only 1% higher. Try explaining that its a 33% difference and their eyes glaze over. Secondly people end up doing multi variable tests without realising that this is what they are doing. This ties in with the last mistake of trying to change too much. Small changes are great and you can keep monitoroing small changes as you go along.