What percentage of experiments should succeed?

About 10-30%. If most experiments succeed, your bar is too low and you are only testing safe, incremental changes. If fewer than 10% succeed, your hypotheses need better grounding. The value of experimentation is in the failures that prevent bad ships.

Do you need a lot of traffic to run experiments?

For A/B tests, yes: you need enough traffic to reach statistical significance (typically thousands of users per variant). Low-traffic products can experiment through qualitative methods: prototype testing, fake door tests, and user interviews.

Product Experimentation: PM Definition

What is Product Experimentation?

Product experimentation is the systematic practice of testing product changes with real users in controlled conditions before full deployment. Instead of building a feature and hoping it works, you release it to a subset of users, measure the impact, and use the data to decide whether to ship, iterate, or kill.

Experimentation encompasses methods from A/B testing (comparing two variants) to multivariate testing (comparing multiple elements) to feature flag-controlled rollouts where metrics are monitored during gradual exposure.

Why Product Experimentation Matters

Most product teams are wrong about what will work more often than they are right. At Booking.com, roughly 90% of experiments show no significant improvement. At Microsoft, one-third of experiments show negative results. Without experimentation, those negative-impact changes would ship to all users.

Experimentation also accelerates learning. Each experiment produces data that informs the next experiment. Over time, the team develops increasingly accurate intuition about what works for their users.

How to Build an Experimentation Practice

Start with infrastructure. You need feature flags (to control who sees what), event tracking (to measure behavior), and a statistical analysis tool (to evaluate results). Without these, experiments are manual and unreliable.

Write a hypothesis for every experiment. "We believe [change] will [outcome] for [audience]. We will measure [metric] and consider the experiment successful if [threshold]." This structure prevents fishing for positive signals.

Run experiments long enough for statistical significance. Ending an experiment early because results look promising leads to false positives. Use a sample size calculator and commit to the duration before launching.

Document and share every result. Failed experiments are as valuable as successful ones. Create a shared experiment log that the entire team can reference.

Product Experimentation in Practice

Booking.com runs over 1,000 concurrent experiments. Their philosophy: every change is an experiment. This extreme approach has made them one of the highest-converting websites globally.

Netflix uses experimentation for everything from recommendation algorithms to thumbnail images. They discovered that personalized artwork (different thumbnails for different users) increased engagement by 20%. Without experimentation, they would have guessed at a single "best" image.

Common Pitfalls

Only testing safe changes. If every experiment is a button color change, you are not learning enough. Test bold hypotheses.
Peeking at results. Checking daily and stopping when results look good inflates false positive rates. Wait for full statistical significance.
No experimentation culture. If experiments are seen as extra work rather than core practice, they get skipped when the team is busy.
Ignoring qualitative context. An experiment tells you what happened, not why. Pair quantitative experiments with qualitative user research.

Experimentation Methods: Beyond A/B Tests

A/B testing gets the most attention, but it is only one method. Match your method to your traffic, question, and risk level.

Method	Traffic needed	Best for	Time to result
A/B test	1,000+ users per variant	Measurable changes to existing flows	1-4 weeks
Multivariate test	10,000+ users per variant	Testing multiple variables simultaneously	2-6 weeks
Fake door test	500+ visitors	Validating demand before building	1-2 weeks
Prototype test	5-10 users	Usability and comprehension	1-3 days
Beta rollout	100+ users	Complex features needing real-world feedback	2-4 weeks
Dogfooding	Internal team	Finding bugs and UX issues before external launch	1-2 weeks

Low-traffic products should not force A/B tests. With 200 daily active users, an A/B test takes months to reach significance. Use qualitative methods (prototype tests, user interviews) instead and save A/B testing for high-traffic flows like onboarding or checkout.

How to Build an Experimentation Culture

The hardest part of experimentation is not the tools. It is the culture. Here is how to build a team that experiments by default.

Make experimentation the default, not the exception. Every feature launch should include a hypothesis and success metric. If a PM cannot articulate what they expect to change, the feature is not ready to ship. This does not mean every change needs a formal A/B test. It means every change needs a measurable expected outcome.

Celebrate learning, not just wins. Share failed experiment results with the same enthusiasm as successful ones. A failed experiment that prevents a bad feature from shipping saves weeks of engineering. At Booking.com, teams present failed experiments in weekly reviews with the same rigor as wins.

Set an experimentation velocity target. Track experiments per team per quarter. A product team running 2-3 experiments per sprint learns faster than one running 1 per quarter. The PM Benchmark tool can help you compare your shipping velocity against industry standards.

Invest in self-serve infrastructure. If running an experiment requires an engineer to set up feature flags and analytics, experiments will be bottlenecked by engineering capacity. Invest in tools that let PMs and designers configure experiments independently. The feature flag infrastructure should be as easy to use as a form.

Experimentation Metrics Cheat Sheet

Track these metrics to evaluate your experimentation program, not just individual experiments.

Experiment velocity: Number of experiments launched per team per quarter. Target: 8-12 for growth teams, 4-6 for platform teams.
Win rate: Percentage of experiments that show statistically significant positive results. Healthy range: 15-30%. Below 10% means hypotheses are poorly grounded. Above 40% means you are only testing safe changes.
Time to decision: Days from experiment launch to ship/kill decision. Target: under 3 weeks for most experiments. Experiments running longer than 4 weeks are usually underpowered.
Impact per experiment: Average metric lift from winning experiments. Track this over time. If impact per experiment declines, you are running out of easy wins and need to test bolder ideas.
Experiment coverage: Percentage of feature launches that include an experiment. Target: 80%+ for user-facing changes.

Use the RICE calculator to prioritize which experiment ideas to run first based on expected reach and impact.

Product experimentation uses A/B testing as its primary method, enabled by feature flags. It follows experiment design principles and is grounded in hypothesis-driven development. Results are analyzed through product analytics. Product discovery uses experimentation as one of its core methods for validating ideas before full development.

Product Experimentation

What is Product Experimentation?

Why Product Experimentation Matters

How to Build an Experimentation Practice

Product Experimentation in Practice

Common Pitfalls

Experimentation Methods: Beyond A/B Tests

How to Build an Experimentation Culture

Experimentation Metrics Cheat Sheet

Put it into practice

Related Terms

Frequently Asked Questions

Get the PM Toolkit Cheat Sheet

Related reading

Feature Flagging for Product Managers

Product Launch Metrics That Actually Matter

How to Build a Product Roadmap (Step-by-Step, 2026)

Aha! Roadmap Guide: Full Setup in 5 Steps (2026)

Airtable Roadmap Guide: Free Template + Setup Steps

Keep exploring

Product Experimentation

What is Product Experimentation?

Why Product Experimentation Matters

How to Build an Experimentation Practice

Product Experimentation in Practice

Common Pitfalls

Experimentation Methods: Beyond A/B Tests

How to Build an Experimentation Culture

Experimentation Metrics Cheat Sheet

Related Concepts

Put it into practice

Related Terms

Frequently Asked Questions

Get the PM Toolkit Cheat Sheet

Related reading

Feature Flagging for Product Managers

Product Launch Metrics That Actually Matter

How to Build a Product Roadmap (Step-by-Step, 2026)

Aha! Roadmap Guide: Full Setup in 5 Steps (2026)

Airtable Roadmap Guide: Free Template + Setup Steps

Keep exploring