Lesson 11: Optimizing Through Testing
Written by Peep Laja at ConversionXL.com
How often does this happen:
A business sets up email capture forms with pretty good lead magnets, and essentially considers the “gather emails” project done. The only thing to do now is to watch the subscribers come in.
Unfortunately, too often.
Declaring it done is a waste of money. Your website is never done, your list building doesn’t end. Declaring it done is arrogant, stupid and will cost you a lot of money.
Whatever you set up to gather emails is a mere starting point, a hypothesis for what might work. Now the real world test starts – and with it the process of continuous optimization.
It’s impossible to know in advance what will work the best for your customers. Yes there are heuristics and best practices, but they are a mere starting point for optimization. Once you’ve set up your listing building machine, you need to start optimizing it to figure out what works the best.
Optimizing is the best thing you can do for growing your list.
True optimization is figuring out what works better for your audience.
It’s not about copying successful websites. You can rip off Noah’s website design and offer, but that doesn’t mean that your conversion rate will be the same. Your traffic sources are different, your reputation is different, your target audience is different, your relationship with your audience is different, and so on.
Email1K is an effective course because it gives you a wide variety of options and strategies to test, try, and find what is most effective for you. While the overall strategy might fit you, the exact execution of the strategy needs multiple iterations.
So don’t copy your role models, and stop copying your competitors: they don’t know what they’re doing either!
I hear this all the time. “Our competitor, X, is doing Y, we should do that too” or “X is market leader and they have Y, we need to have Y”.
There are 2 key things wrong with this reasoning:
- The reason the site you want to copy uses X or Y on their website (menu, navigation, checkout, home page layout, etc) is probably random. In 90% of cases the layout is what their web designer came up with (he/she most likely did not perform a long, thorough analysis and testing), or they simply copied another competitor.
- What works for them won’t necessarily work for you.
You’d be surprised by the number of people who actually know their shit. It’s (maybe!) 5% knowing and 95% opinions. It’s the blind leading the blind!
– Jeff Bezos
Consider this:
- The right wording for your opt-in offer can make a huge difference. A better offer can increase your daily subscriber rate many times. It’s impossible to predict which offer will work better.
- The right form design can make all the difference to your opt-in conversion rate. You can’t know in advance which design works best for your audience. No, you can’t copy other people.
- Some form locations work better than others, the difference can be 20% to 200% per day. But you don’t know which location is best.
- More email capture mechanism can do much better than a single form (e.g. static form on the sidebar + popup + scroll triggered box vs just a popup). But might not. Is 4 better than 2? Or 5 better than 4? There’s no way to know in advance.
So what is the ideal form design, form location and ideal offer?
There is no such thing. Stop chasing the silver bullet. Just keep testing – your goal should be to do better than what you did last month. That’s your benchmark.
The (almost) only way to know what works better is testing
A/B testing means that 50% of your visitors see A version of your website and the other 50% sees version B. The traffic split is done automatically by a testing tool, and it’s cookie based (once they see version B, they will always see version B). If you have more traffic, you could do A/B/C/D/… testing as well. The more variations you test at once, the more traffic you need to reach validity.
Avoid sequential testing – where you show offer A for one week and offer B for the second week – as much as possible. The problem with sequential testing is that your traffic sources fluctuate and the external world is not the same every week, and those things will affect the outcome. So it won’t be apples to apples comparison.
The only time to use sequential testing is when you don’t have enough traffic for A/B testing. But if your results get 5% better, how would you know? That means in order to be confident that things got better, you need to see a positive change you can believe in. If you used to get 20 signups per day, and now you get 40 – that you can trust. But if it’s like 21 or 22 per day now, that might be due to traffic fluctuations.
How to run A/B tests properly:
A very common scenario: a business runs tens and tens of A/B tests over the course of a year, and many of them “win”. Some tests get you 25% uplift in signups, or even higher. Yet – when you roll out the change, signups don’t increase 25%. And 12 months after running all those tests, the conversion rate is still pretty much the same. How come?
The answer is this: your uplifts were imaginary. There was no uplift to begin with. Yes, your testing tool said you have 95% statistical significance level, or higher. Well that doesn’t mean much. Statistical significance and validity are not the same.
When your testing says that you’ve reach 95% or even 99% confidence level, that doesn’t mean that you have a winning variation.
Here’s an example. Two days after starting a test these were the results:
The variation I built was losing bad—-by more than 89% (and no overlap in the margin of error). It says here Variation 1 has 0% chance to beat Control.
So a 100% significant test, and 800+ percent uplift (or rather Control is over 800% better that the treatment). Let’s end the test, shall we – Control wins!?
Or how about we give it some more time instead. This is what it looked like 10 days later:
That’s right, the variation that had 0% chance of beating control was now winning with 95% confidence. What’s up with that? How come “100% significance” and “0% chance of winning” became meaningless? Because they are.
It’s the same with the second test screenshot (10 days in) – even though it says 95% significance, it’s still not “cooked”. Sample is too small, the absolute difference in conversions is just 19 transactions. That can change in a day.
Statistical significance is not a stopping rule. That alone should not determine whether you end a test or not.
Statistical significance does not tell us the probability that B is better than A. Nor is it telling us the probability that we will make a mistake in selecting B over A. These are both extraordinarily commons misconceptions, but they are false. To learn what the p-values are really about, read this post.
The Stopping Rule
So when is a test cooked?
Alas, there is no universal heavenly answer out there, and there are a lot of “depends” factors. That being said, you can have some pretty good stopping rules that will get you to the right path in most cases.
Here’s my stopping rule:
- Test duration: at least 3 weeks (better if 4)
- Minimum pre-calculated sample size reached (using different tools). Don’t trust any test result that has less than 250-400 conversions PER variation.
- Statistical significance at least 95%
This might be different for some tests because of whatever peculiarities, but in most cases I adhere to this.
Conclusion
Assume nothing, forget cherished notions, ignore what others are doing. Test what works for you.
Whatever you have in place right now for list building can most likely be improved – every single month. If you have a crappy site, getting big wins is easy. If it’s pretty decent, you can expect to improve 5% to 15% per month. 5% not good enough for you? Well, if you improve your conversion rate by just 5% a month, that’s almost 80% improvement over 12 months. Compounding interest. That’s how math works.
Want to improve conversions by at least 80% a year? Start testing.
Lesson 11 Task
1. Sign up with an A/B testing tool / plugin. Many testing plugins are available for WordPress, there’s also a free split URL testing tool inside Google Analytics. If you’re familiar with jquery/html/css, go for more advanced tools like Optimizely or VWO.
2. Set up an experiment to test your lead magnet. Create 1-2 alternative versions for your opt-in offer.
3. Set up an experiment testing different locations and/or designs for your opt-in box. It’s like real estate: location, location, location + design matters.
What is the minimum traffic volume acceptable for A/B split testing?
Thanks,
J
Let me share a great tool with you:
http://www.coreminer.com/calculators/ab-test