Statistical power
.webp)
Anticipating the ability of an A/B test to detect a real effect
📌 Definition of statistical power in A/B testing
Statistical power is the probability that an A/B test will detect a real effect if it actually exists.
In other words, it assesses the test's ability to avoid a false negative —that is, to avoid incorrectly concluding that a variation is ineffective when it actually has a real impact.
In the contextof conversion rate optimization (CRO), sufficient power ensures that decisions made based on a test are well-founded and actionable.
🎯 Recommended level
In most cases, a statistical power of 80% is considered the minimum acceptable level.
This means that the test has an 80% chance of detecting a real effect, and only a 20% risk of missing it (type II error).
⚖️ Factors that influence power
Power depends on three key parameters:
- Sample size
→ The more traffic or conversions you have, the more you can detect small effects. - Minimum Detectable Effect (MDE)
→ This isthe minimum effect you want to be able to detect (e.g., +5% conversion).
→ The smaller the expected effect, the more traffic you need to detect it with confidence. - Confidence level (e.g., 95%)
→ This determines the threshold at which a result is considered statistically significant.
→ The more demanding the level, the lower the power (for a given sample size).
❗ Why it's crucial in A/B testing
Without sufficient power:
- You risk rejecting a winning variation due to insufficient evidence.
- Your test can be described as "neutral" when the effect exists but is masked by a lack of data.
- You're wasting time, traffic, and opportunities for optimization.
💡 Conversely, good power strengthens the reliability of learning, even in cases where the results are not significant.
🧪 Concrete examples
SituationLow powerRiskYou are testing a variation with an MDE of 3% but little traffic40%You have a 60% chance of missing a real effectYouare targeting a variation of +10% over 20,000 sessions85%Good power → reliable conclusion in case of success or failure
✅ Best practices in CRO
- Calculate the power before the test, using a tool (Optimizely, AB Tasty, Evan Miller) or a custom model (BigQuery, R).
- Define a realistic MDE based on business challenges (e.g., 2% on a homepage ≠ 10% on a product page).
- Extend the test duration or aggregate traffic to achieve sufficient power.
- Never interpret a "neutral" test without checking its actual power.
.avif)
