Skip to main content
New: Deck Doctor. Upload your deck, get CPO-level feedback. 7-day free trial.
TemplateFREE⏱️ 20-30 minutes

A/B Test Plan Template for Product Analytics

Free A/B test plan template for product managers. Structure experiments with clear hypotheses, success metrics, sample size calculations, and results...

Last updated 2026-02-19
A/B Test Plan Template for Product Analytics preview

A/B Test Plan Template for Product Analytics

Free A/B Test Plan Template for Product Analytics — open and start using immediately

or use email

Instant access. No spam.

Get Template Pro — all templates, no gates, premium files

888+ templates without email gates, plus 30 premium Excel spreadsheets with formulas and professional slide decks. One payment, lifetime access.

Need a custom version?

Forge AI generates PM documents customized to your product, team, and goals. Get a draft in seconds, then refine with AI chat.

Generate with Forge AI

What This Template Is For

Running A/B tests without a written plan is how teams ship inconclusive experiments and waste engineering cycles. The test runs for "a few weeks," someone checks the dashboard, the results are ambiguous, and the team debates what happened for another week before moving on. The problem is rarely the testing tool. The problem is that nobody agreed upfront on what success looks like, how long the test needs to run, or what they would do with the results.

This template forces clarity before a single line of test code is written. It captures the hypothesis, the primary and secondary metrics, the required sample size, the expected duration, a description of each variant, success criteria, and a structured results table. Every field exists to prevent a specific failure mode: unclear hypotheses that cannot be falsified, tests that run too short, metrics that conflict, and results that sit in a dashboard without driving a decision.

If you are building a culture of experimentation on your team, the Product Analytics Handbook covers how to design an experimentation program from the ground up. For calculating the sample size and statistical significance of a specific test, use the A/B Test Calculator. And if you are new to the concept, the A/B testing glossary entry covers the fundamentals.

When to Use This Template

  • Before any A/B test. Every test should have a written plan reviewed by the PM, engineer, and data analyst before implementation begins. This is not bureaucracy. It is the difference between a conclusive experiment and an expensive guess.
  • When the team disagrees on a design direction. Instead of debating which CTA color or pricing layout is better, test it. The plan template ensures you test the right thing for long enough with the right metric.
  • Before changing a high-traffic flow. Checkout, onboarding, signup, and pricing pages carry significant revenue risk. A structured test plan with guardrail metrics prevents you from shipping a change that improves one metric while silently degrading another.
  • When you need stakeholder buy-in for an experiment. A filled-out test plan makes it easy for a VP or CFO to say yes. It shows you have thought through the risks, the duration, and the decision criteria.
  • During post-experiment reviews. Use the completed plan (with results filled in) as the single artifact for the experiment retrospective. It answers "what did we test, what did we learn, and what did we decide?" in one document.

How to Use This Template

  1. Start with the hypothesis. Write it in the format: "If we [change], then [metric] will [improve/decrease] by [amount] because [reason]." A good hypothesis is specific and falsifiable. "If we change the CTA, conversions will go up" is too vague. "If we change the CTA from 'Start Free Trial' to 'See It In Action', trial starts will increase by 10% because the new copy reduces perceived commitment" is testable.
  2. Define success criteria before you see any data. Write down the minimum detectable effect (MDE), the confidence level (typically 95%), and the decision rule ("If the variant beats control by X%, we ship it. If it loses, we revert. If it is inconclusive, we [extend / kill / iterate]."). Deciding this after seeing partial results is confirmation bias dressed up as analysis.
  3. Calculate the required sample size and duration. Use the A/B Test Calculator or a power analysis tool. Input your baseline conversion rate, your MDE, and your daily traffic to the test surface. Do not guess. Under-powered tests produce false negatives and waste everyone's time.
  4. Document both variants clearly. Describe the control and the variant in enough detail that an engineer can implement them without ambiguity and a designer can review them without confusion. Include screenshots or mockups if possible.
  5. Run the test for the full planned duration. Do not peek at results daily and call the test early when the dashboard looks good. Early stopping inflates false positive rates. If you calculated a 14-day test, run it for 14 days.

The Template

Test Overview

FieldDetails
Test Name[Short, descriptive name, e.g. "Pricing Page CTA Copy Test"]
Test ID[Internal tracking ID, e.g. EXP-042]
Owner[PM name]
Engineer[Engineer implementing the test]
Analyst[Person responsible for results analysis]
Start Date[YYYY-MM-DD]
Planned End Date[YYYY-MM-DD]
Status[Draft / In Review / Running / Complete / Killed]

Hypothesis

Format. If we [specific change], then [primary metric] will [direction] by [magnitude] because [rationale based on user behavior or prior data].

[Write your hypothesis here]

Confidence level. [Low / Medium / High] based on [prior data, qualitative signal, or intuition]


Metrics

Metric TypeMetric NameCurrent BaselineTargetMeasurement Method
Primary[The single metric this test is designed to move][Current value, e.g. 3.2%][Target value, e.g. 3.5%][Tool and event, e.g. "Mixpanel: button_clicked where page=pricing"]
Secondary[Supporting metric that should also improve or stay flat][Value][Direction][Measurement method]
Secondary[Another supporting metric][Value][Direction][Measurement method]
Guardrail[Metric that must NOT degrade, e.g. bounce rate, support tickets, revenue per user][Value][Must stay within X%][Measurement method]
Guardrail[Another guardrail][Value][Must stay within X%][Measurement method]

Sample Size and Duration

FieldDetails
Baseline conversion rate[e.g. 3.2%]
Minimum detectable effect (MDE)[e.g. 10% relative lift = 3.52% absolute]
Statistical significance threshold[e.g. 95% confidence / p < 0.05]
Statistical power[e.g. 80%]
Required sample size per variant[e.g. 12,400 visitors]
Total sample needed[e.g. 24,800 visitors]
Daily traffic to test surface[e.g. 1,800 visitors/day]
Estimated test duration[e.g. 14 days]
Traffic allocation[e.g. 50/50]

Variants

Control (A). [Describe the current experience. What does the user see and interact with today?]

Variant (B). [Describe the change. Be specific about what is different: copy, layout, color, flow, or logic. Include a screenshot or mockup link if available.]

Variant (C), if applicable. [Describe the additional variant. Note: each additional variant increases required sample size.]


Audience and Segmentation

FieldDetails
Target audience[e.g. All visitors to /pricing, or Signed-in users on Free plan]
Exclusions[e.g. Internal users, existing Enterprise customers, users on mobile]
Segment for sub-analysis[e.g. New vs. returning visitors, US vs. international]

Success Criteria and Decision Rules

Write these before the test starts. Do not change them while the test is running.

OutcomeCriteriaAction
Clear winVariant beats control on primary metric by >= MDE with 95% confidence AND guardrail metrics are flat or improvedShip the variant to 100%
Clear lossControl beats variant on primary metric with 95% confidence OR guardrail metric degrades by > [X%]Revert to control. Document learnings.
InconclusiveNeither variant reaches significance after full planned duration[Kill the test and move on / Extend by [N] days / Redesign with a bolder change]
Guardrail breachAny guardrail metric degrades by more than [X%] at any point during the testStop the test immediately. Investigate root cause before deciding next steps.

Risks and Mitigations

RiskLikelihoodImpactMitigation
[e.g. Variant causes confusion, increasing support tickets][Low/Med/High][Low/Med/High][e.g. Monitor support queue daily. Kill switch if ticket volume spikes >20%]
[e.g. Test takes longer than expected due to lower traffic]
[e.g. Novelty effect inflates early results]

Results Table

Complete this section after the test ends. Do not fill it in while the test is running.

MetricControl (A)Variant (B)Relative DifferenceConfidenceSignificant?
[Primary metric][Value][Value][+/- X%][X%][Yes / No]
[Secondary metric 1][Value][Value][+/- X%][X%][Yes / No]
[Secondary metric 2][Value][Value][+/- X%][X%][Yes / No]
[Guardrail metric 1][Value][Value][+/- X%][X%][Flat / Degraded]
[Guardrail metric 2][Value][Value][+/- X%][X%][Flat / Degraded]

Decision. [Ship variant / Revert to control / Iterate and retest]

Key learning. [One paragraph summarizing what you learned, even if the test was inconclusive]


Filled Example: SaaS Pricing Page CTA Test

Test Overview (Example)

FieldDetails
Test NamePricing Page CTA Copy Test
Test IDEXP-042
OwnerJordan (PM, Growth)
EngineerSam (Frontend)
AnalystPriya (Data)
Start Date2026-02-17
Planned End Date2026-03-03
StatusComplete

Hypothesis (Example)

If we change the primary CTA on the pricing page from "Start Free Trial" to "See PlanForge in Action" with a 14-day trial badge, then trial signups will increase by 12% because user research showed that prospects feel "Start Free Trial" implies they need to commit before they understand the product. The new copy emphasizes exploration over commitment.

Confidence level. Medium, based on 6 customer interviews where 4 of 6 prospects mentioned hesitation around the word "trial."

Metrics (Example)

Metric TypeMetric NameCurrent BaselineTargetMeasurement Method
PrimaryTrial signup rate (pricing page visitors who start a trial)4.1%4.6% (+12% relative)Mixpanel: trial_started where source=pricing
SecondaryPricing page engagement time48 secondsIncreaseMixpanel: avg session duration on /pricing
SecondaryPricing FAQ click rate11%Stay flat or increaseMixpanel: faq_clicked where page=pricing
GuardrailTrial-to-paid conversion rate (7-day)18%Must stay above 16%Stripe events: subscription_created / trial_started
GuardrailSupport tickets from pricing page12/weekMust stay below 18/weekIntercom: tickets tagged "pricing"

Sample Size and Duration (Example)

FieldDetails
Baseline conversion rate4.1%
Minimum detectable effect12% relative lift (4.59% absolute)
Statistical significance threshold95% confidence
Statistical power80%
Required sample size per variant11,200 visitors
Total sample needed22,400 visitors
Daily traffic to pricing page1,650 visitors/day
Estimated test duration14 days
Traffic allocation50/50

Variants (Example)

Control (A). Current pricing page. Green CTA button reads "Start Free Trial" in white text. No badge. Button links to the signup form with email and password fields.

Variant (B). Same pricing page layout. CTA button reads "See PlanForge in Action" in white text. A small badge below the button reads "14-day free trial. No credit card." Button links to the same signup form.

Results (Example)

MetricControl (A)Variant (B)Relative DifferenceConfidenceSignificant?
Trial signup rate4.1%4.7%+14.6%97.2%Yes
Pricing page engagement time47s52s+10.6%89%No
Pricing FAQ click rate11.3%10.8%-4.4%62%No
Trial-to-paid conversion (7-day)18.1%17.4%-3.9%71%Flat (within guardrail)
Support tickets from pricing11/week13/week+18%N/AFlat (within guardrail)

Decision. Ship Variant B. Primary metric exceeded MDE with 97.2% confidence. Guardrail metrics stayed within acceptable range. Trial-to-paid conversion dipped slightly but remained above the 16% floor.

Key learning. Reducing perceived commitment in CTA copy increased trial starts by ~15%. The "14-day free trial. No credit card." badge likely contributed. Engagement time and FAQ clicks were not significantly different, suggesting the change did not alter how users evaluated pricing. It primarily reduced friction at the moment of the click. Next test: apply the same "action-first" copy pattern to the homepage hero CTA.


Key Takeaways

  • Write the hypothesis, success criteria, and decision rules before the test starts. Changing them mid-test is not iteration. It is p-hacking.
  • Every test needs exactly one primary metric. Secondary metrics add context. Guardrail metrics prevent collateral damage. Conflating these roles leads to ambiguous results.
  • Calculate the required sample size and commit to the full test duration. Early stopping based on promising results inflates false positive rates. The A/B Test Calculator handles the math.
  • Document variants in enough detail that someone who was not in the planning meeting could understand exactly what changed and why.
  • Fill in the results table and key learning even when the test is inconclusive. Negative and null results are data. They prevent the team from rerunning the same failed experiment six months later.
  • For a structured approach to building an experimentation practice, see the product experimentation guide.

About This Template

Created by: Tim Adair

Last Updated: 2/19/2026

Version: 1.0.0

License: Free for personal and commercial use

Frequently Asked Questions

How long should I run an A/B test?+
Run the test for the duration your power analysis calculated. For most mid-traffic SaaS pages (1,000-5,000 daily visitors), this is 2-4 weeks. Never run a test for less than one full business week (to account for day-of-week effects). If your sample size calculation says 21 days, run it for 21 days, even if the results look significant on day 7.
What if my primary metric improves but a guardrail metric degrades?+
This is exactly why guardrail metrics exist. If the degradation is within your pre-defined acceptable range, ship the variant and monitor the guardrail closely post-launch. If it exceeds the threshold, do not ship. Investigate whether the degradation is a direct consequence of the change (causal) or a coincidence (correlation). When in doubt, revert and redesign.
Can I test more than two variants at once?+
Yes, but be aware that each additional variant increases the total sample size needed. A three-variant test (A/B/C) requires roughly 50% more traffic than a two-variant test to reach the same statistical power. For most teams, two variants (control + one change) is the right tradeoff between learning speed and statistical rigor. Reserve multi-variant tests for high-traffic surfaces where the extra duration cost is minimal.
What is the difference between statistical significance and practical significance?+
Statistical significance tells you the result is unlikely to be due to chance. Practical significance tells you the result is large enough to matter for your business. A test can be statistically significant at p < 0.01 but show only a 0.1% lift in conversions, which might not justify the engineering cost to maintain the variant. Always evaluate both. Define your MDE based on what would be practically meaningful, not just what would be statistically detectable.
How do I decide between A/B testing and just shipping the change?+
Test when the stakes are high and reversibility is low. Pricing pages, checkout flows, onboarding sequences, and core activation loops deserve formal A/B tests because a wrong decision has measurable revenue impact. For low-risk changes on low-traffic pages (a help article layout, a settings page tweak), just ship it and monitor metrics. The cost of running a formal test should be proportional to the risk of getting it wrong. ---

Explore More Templates

Browse our full library of PM templates, or generate a custom version with AI.

Free PDF

Like This Template?

Subscribe to get new templates, frameworks, and PM strategies delivered to your inbox.

or use email

Join 10,000+ product leaders. Instant PDF download.

Want full SaaS idea playbooks with market research?

Explore Ideas Pro →