G
Statistics

Statistical power

Anticipating the ability of an A/B test to detect a real effect

📌 Definition of statistical power in A/B testing

Statistical power is the probability that an A/B test will detect a real effect if it actually exists.
In other words, it assesses the test's ability to avoid a false negative —that is, to avoid incorrectly concluding that a variation is ineffective when it actually has a real impact.

In the contextof conversion rate optimization (CRO), sufficient power ensures that decisions made based on a test are well-founded and actionable.

🎯 Recommended level

In most cases, a statistical power of 80% is considered the minimum acceptable level.
This means that the test has an 80% chance of detecting a real effect, and only a 20% risk of missing it (type II error).

⚖️ Factors that influence power

Power depends on three key parameters:

  1. Sample size
    → The more traffic or conversions you have, the more you can detect small effects.
  2. Minimum Detectable Effect (MDE)
    → This isthe minimum effect you want to be able to detect (e.g., +5% conversion).
    → The smaller the expected effect, the more traffic you need to detect it with confidence.
  3. Confidence level (e.g., 95%)
    → This determines the threshold at which a result is considered statistically significant.
    → The more demanding the level, the lower the power (for a given sample size).

❗ Why it's crucial in A/B testing

Without sufficient power:

  • You risk rejecting a winning variation due to insufficient evidence.
  • Your test can be described as "neutral" when the effect exists but is masked by a lack of data.
  • You're wasting time, traffic, and opportunities for optimization.

💡 Conversely, good power strengthens the reliability of learning, even in cases where the results are not significant.

🧪 Concrete examples

SituationLow powerRiskYou are testing a variation with an MDE of 3% but little traffic40%You have a 60% chance of missing a real effectYouare targeting a variation of +10% over 20,000 sessions85%Good power → reliable conclusion in case of success or failure

✅ Best practices in CRO

  • Calculate the power before the test, using a tool (Optimizely, AB Tasty, Evan Miller) or a custom model (BigQuery, R).
  • Define a realistic MDE based on business challenges (e.g., 2% on a homepage ≠ 10% on a product page).
  • Extend the test duration or aggregate traffic to achieve sufficient power.
  • Never interpret a "neutral" test without checking its actual power.

Talk to a Welyft expert

The Data-Marketing agency that boosts the ROI of your customer journeys

Make an appointment
Share this article on

Tell us more about your project

We know how to boost the performance of your digital channels.
CRO
Data
User Research
Experiment
Contact us