How do CRO professionals run experiments in 2019? Convert.com, an optimization tool, analyzed 28,304 experiments, picked randomly from their customers.

This article shares some of their top observations and a few takeaways about:

  • When CROs choose to stop tests
  • Which types of experiments are most popular
  • How often personalization is part of the experimentation process
  • How many goals CROs set for an experiment
  • How costly “learning” from failed experiments can get

1. One in five CRO experiments is significant, and agencies still get better results.

h5wzpOjUQOB6F9XxASMwNHTssZ8i3B9Cq7tcU-LqeMIPgsPYGJcUErpqTJWFzFdgN7y1ZOyY3nc-MWnw7HyQMVQPYGiMY6AGA5KUIVSMAgCVwTgS_5MkDhW6csnca2gDW1hWSW1k

  Only 20% of CRO experiments reach the 95% statistical significance mark.  While there might not be anything magical about reaching 95% statistical significance, it’s still an important convention.

You could compare this finding with the one from Econsultancy’s 2018 optimization report in which more than two-thirds of respondents said that they saw a “clear and statistically significant winner” for 30% of their experiments. (Agency respondents, on the other hand, did better, finding clear winners in about 39% of their tests.)

Failing to reach statistical significance may result from two things—hypotheses that don’t pan out or, more troubling, stopping tests early.

  For those experiments that did achieve statistical significance, only 1 in 7.5 showed a lift of more than 10% in the conversion rate.  

Agencies did slightly better: 15.84% of their experiments were significant with a lift of at least 10%. Agencies did better than in-house CRO teams: this year, they outperformed in-house teams by 21%.

(There was no significant difference between agencies and in-house customers when comparing their monthly testing volumes.)

2. A/B tests continue to be the most popular experiment.

A/B testing is still the go-to test for most optimizers, with A/B tests totaling 97.5% of all experiments on our platform.  The average number of variations per A/B test was 2.45 . 

This trend isn’t new. A/B tests have always dominated.

Certainly, A/B tests are simpler to run; they also deliver results more quickly and work with smaller traffic volumes. Here’s a complete breakdown by test type:

  • A/B DOM: 80.9%
  • A/B Split URL: 16.6%
  • A/A: 1.15%
  • Multivariate (MVT): 0.78%
  • Personalization: 0.57%

IHsM0wG3juIZeLzG9_d6alctoSokiWioNgtodBKj2KxGsRRRXBsCQFv6glFh3ke7eXhlXZvlbBac80pCzl3C-lxd4A8YAQyhEU2VRdDulboaJj7QX73JOv7hmJdQKyGGZEO74_xx

North American optimizers ran 13.6 A/B experiments a month, while those from Western Europe averaged only 7.7.

There were other cross-Atlantic differences: Western Europe runs more A/B tests with DOM manipulation; the United States and Canada run twice as many split-URL experiences.

3. Optimizers are setting multiple goals.

On average, optimizers set at least four goals (e.g. clicking a certain link, visiting a certain page, a form submit, etc.) for each experiment. This means they set up three secondary goals in addition to the primary conversion rate goal.

WQXq_igVV5NJMCVpqg_zFCXL3TH-PwJWK2kjGoFOjhu8rjUx58aplUviZs9oBa52MmplwaVMXxiF9tpgKVZaV8Kpmsct6SRVjBuZfWE_6e4oWrx_svKIJsf5YmGdguhZevVA3j7L

Additional “diagnostic” or secondary goals can increase learning from experiments, whether they’re winning or losing efforts. While the primary goal unmistakably declares the “wins,”  the secondary metrics shine a light on how an experiment affected the target audience’s behavior . (Optimizely contends that successful experiments often track as many as eight goals to tell the full experiment story.)

This is a positive—customers are trying to gain deeper insights into how their changes impact user behavior across their websites.

While sales and revenue were primary success metrics, common secondary metrics included things like bounce rate or “Contact Us” form completion rates.

High performers (companies that secured an improvement of 6% or more in their primary success metric) were more likely to measure secondary metrics.

4. Personalization is used in less than 1% of experiments.

Personalization isn’t popular yet, despite its potential. Less than 1% of the research sample used personalization as a method for optimization, even though personalization is available at no added cost on all the plans.

s7MqA9Xxa3NoG9qHJOGubFDhzsBhWRZ05xnMl4KEMmIp5xo9LaloGSGuYAgSiKmhuYRPZUuCmQty0eXZENk7P0sZwKJeWcCH4oE41wXYqVNtUhe2wdaHPnaqo07xyyhDvvVqUWjE

As the CRO stack goes, personalization is still a tiny minority. A quick look at data from BuiltWith—across 362,367 websites using A/B testing and personalization tools—reinforces the findings:

  1. Google Optimize 37%
  2. Optimizely 33%
  3. VWO 14%
  4. Adobe 6%
  5. AB Tasty 4%
  6. Maxymiser 3%
  7. Dynamic Yield <1%
  8. Zarget <0.5%
  9. Convert <0.5%
  10. Monetate <0.5%
  11. Kameleoon <0.5%
  12. Intellimize <0.1%

U.S.-based users are using personalization six times more often than those from Western Europe.

5. Learnings from experiments without lifts aren’t free.

In the analyzed sample, “winning” experiments—defined as all statistically significant experiments that increased the conversion rate—produced an average conversion rate lift of 61%.

Experiments with no wins—just learnings—can negatively impact the conversion rate. Those experiments, on average, caused a 26% decrease in the conversion rate.

We all love to say that there’s no losing, only “learning,” but it’s important to acknowledge that even  learnings from non-winning experiments come at a cost .

gaVyg4RzpjRRVMxjC3u0KVj_XxfF_qEa5AfGrJFoOLbloYA6_hOrN4oV1bOApXDFdniSp0mXZtvkfsGn2j5IZ8SrqZG1ecBcjY9HDk8-V22Xi576VXnLHzVab0Ey_2mYqgC2fmos

With roughly 2.45 variations per experiment, every experiment has around an 85% chance of decreasing the conversion rate during the testing period (by around 10% of the existing conversion rate).

Businesses need to archive and learn from all their experiments. According to the CXL report, about 80% of companies archive their results, and 36.6% use specific archiving tools. These are strong indicators that CROs are refueling their experimentation programs with learnings from past efforts.

But while tracking results and documenting learnings can improve a testing program in the long run, there’s urgency to learn from failed experiments and implement successes quickly.

There’s also a need to research and plan test ideas well so that experiments have a higher likelihood of success.

Conclusion

While the research helped establish some industry benchmarks, a few of the findings were hardly surprising (for example, the popularity of A/B tests).

But what was surprising was that so few customers use personalization. In recent years, more businesses declared to be progressive on that front but didn’t actually deliver on that promise. As noted earlier, better data management may make personalization easier for companies to execute.

Other than that, setting up multiple goals is a positive sign—testers want to dig deeper into how their experiments perform to maximize learnings and, of course, wins.

5 Things We Learned from Analyzing 28,304 Experiments


PS.
 Personalization may get even harder going forward, now that browsers get more and more privacy-focused and make it harder to track users.