What This Template Is For
Building the wrong feature is the most expensive mistake a product team can make. Not because of the engineering cost alone, but because of the opportunity cost: every sprint spent on a feature nobody uses is a sprint not spent on one that would have moved a metric.
This template provides a structured process for validating feature ideas before they enter development. It covers four stages: defining the feature hypothesis (what you believe and why), selecting the right validation method (from lightweight to high-fidelity), designing the experiment (what to measure and what constitutes a pass), and documenting the outcome (what you learned and what to do next).
The methods here range from a 30-minute fake-door test to a 2-week prototype study. Choose based on the risk level of the feature: high-stakes features (large engineering investment, irreversible changes, bet-the-company decisions) deserve rigorous validation. Low-stakes features (small UI tweaks, easily reversible changes) can use lighter methods.
This template pairs with the assumption testing template for broader initiative-level validation, and with the Product Discovery Handbook for the full discovery methodology. If the feature involves a new workflow, consider running a usability test on the prototype.
When to Use This Template
- Before adding a feature to the sprint backlog. If the feature will take more than one sprint to build, validate it first.
- When the team disagrees about a feature's value. Replace opinion-based debates with evidence. "Let's test it" is a better response than "I think users want this."
- After receiving a customer request. One customer's request is an anecdote. Five customers describing the same problem is a pattern. This template helps you distinguish between the two.
- During roadmap planning. Use validation results to stack-rank competing feature ideas by evidence strength, not by stakeholder volume.
- When pivoting a feature's design. If user feedback on v1 of a feature was poor, validate the revised approach before rebuilding.
How to Use This Template
- Write the feature hypothesis. State what you believe, who it is for, and what outcome you expect.
- Choose a validation method. Match the method to the feature's risk level and the time you have.
- Design the experiment. Define pass/fail criteria, sample size, and timeline.
- Run the experiment and log results. Capture data, quotes, and observations.
- Decide: build, iterate, or kill. Use the evidence to make a clear recommendation.
The Template
Part 1: Feature Definition
| Field | Details |
|---|---|
| Feature Name | [Short, descriptive name] |
| One-Line Description | [What does this feature do?] |
| Target User | [Which user segment or persona?] |
| Problem It Solves | [What pain point or unmet need does this address?] |
| Evidence for the Problem | [Customer interviews, support tickets, analytics data, competitor analysis] |
| Estimated Build Effort | [T-shirt size: S/M/L/XL or sprint count] |
| Reversibility | [Easy to revert / Hard to revert / Irreversible] |
| Risk Level | [Low / Medium / High based on effort + reversibility] |
Part 2: Feature Hypothesis
Write a testable hypothesis using this format:
We believe that [feature description]
for [target users]
will result in [expected outcome or behavior change]
because [reasoning based on evidence].
>
We will know this is true when [measurable signal].
Example:
We believe that adding a bulk-action toolbar to the project list for power users will reduce the time to update 10+ project statuses by 70% because our analytics show that users with 15+ projects spend 4 minutes clicking through each project individually. We will know this is true when 5 of 8 test participants complete the bulk-update task in under 90 seconds.
Part 3: Validation Method Selection
Choose the method that matches your risk level and available time.
| Method | Best For | Time Required | Evidence Quality | Risk Level |
|---|---|---|---|---|
| Fake Door Test | Testing demand for a feature before building it | 1-2 days setup, 1-2 weeks data collection | Moderate | Low-Medium |
| Painted Door (UI Mockup) | Testing whether users understand and want a capability | 2-3 days | Moderate | Low-Medium |
| Wizard of Oz | Testing the value of a feature by delivering it manually | 1-2 weeks | High | Medium-High |
| Prototype Usability Test | Testing whether users can use the feature successfully | 3-5 days | High | Medium-High |
| Concierge MVP | Testing end-to-end value by performing the service manually for real users | 2-4 weeks | Very High | High |
| Beta/Feature Flag | Testing with real users on real data | 2-4 weeks | Very High | High |
Selected Method: [Your choice]
Rationale: [Why this method fits the risk level and timeline]
Part 4: Experiment Design
| Field | Details |
|---|---|
| Method | [From Part 3] |
| Participants | [Who and how many? Minimum 5 for qualitative, 100+ for quantitative.] |
| Recruitment | [How will you find participants? Existing users, panel, customer list?] |
| Stimulus | [What will participants see or interact with? Mockup, prototype, live feature?] |
| Task(s) | [What will participants try to do?] |
| Metrics | [What will you measure? Task completion rate, time-on-task, click-through rate, NPS?] |
| Pass Criteria | [Specific threshold, e.g., "70% task completion rate" or "200+ clicks on fake door in 1 week"] |
| Fail Criteria | [What result would kill the feature?] |
| Timeline | [Start date, end date, decision date] |
| Owner | [Who runs this experiment?] |
| Cost | [Participant incentives, tool costs, time investment] |
Part 5: Experiment Execution Log
Quantitative Data (if applicable)
| Metric | Target | Actual | Pass/Fail |
|---|---|---|---|
| [e.g., Click-through rate on fake door] | [5%+] | [%] | |
| [e.g., Task completion rate] | [70%+] | [%] | |
| [e.g., Time on task] | [< 90 sec] | [sec] | |
| [e.g., Error rate] | [< 20%] | [%] |
Qualitative Data (if applicable)
| Participant | Task Success | Key Quote | Key Observation |
|---|---|---|---|
| P1 | Yes / No / Partial | ||
| P2 | |||
| P3 | |||
| P4 | |||
| P5 |
Patterns Observed
| Pattern | # Participants | Significance |
|---|---|---|
| [e.g., "Users looked for bulk select in the header, not the sidebar"] | /5 | [High / Medium / Low] |
Part 6: Verdict and Recommendation
| Field | Details |
|---|---|
| Overall Result | Pass / Fail / Inconclusive |
| Evidence Summary | [2-3 sentences summarizing what you learned] |
| Recommendation | Build as designed / Build with modifications / Do not build / Test further |
| Modifications (if applicable) | [What changes are needed based on what you learned?] |
| Remaining Risks | [What uncertainties remain even after this test?] |
| Next Step | [Specific action with owner and date] |
Filled Example: Bulk-Action Toolbar Validation
Context. A project management SaaS has heard from several customers that managing large numbers of projects is tedious. The PM proposes adding a bulk-action toolbar. Before building it (estimated 2 sprints), they validate with a prototype usability test.
Feature Hypothesis (Example)
We believe that adding a bulk-action toolbar to the project list for power users (15+ active projects) will reduce the time to update multiple project statuses by 70% because our analytics show these users spend 4+ minutes clicking through projects individually. We will know this is true when 5 of 8 test participants complete a 10-project bulk-update task in under 90 seconds.
Experiment Design (Example)
| Field | Details |
|---|---|
| Method | Prototype usability test (Figma interactive prototype) |
| Participants | 8 existing users with 15+ active projects |
| Task | "You need to mark these 10 projects as 'On Hold' and reassign them to your team lead. Show me how you would do that." |
| Metrics | Task completion rate (target: 75%+), time on task (target: < 90 sec), error rate (target: < 25%) |
| Pass Criteria | 6 of 8 participants complete the task in < 90 seconds |
| Fail Criteria | Fewer than 4 of 8 complete the task, or average time exceeds 3 minutes |
Results (Example)
| Metric | Target | Actual | Pass/Fail |
|---|---|---|---|
| Task completion rate | 75%+ | 87.5% (7/8) | Pass |
| Average time on task | < 90 sec | 72 seconds | Pass |
| Error rate | < 25% | 25% (2/8 initially selected wrong action) | Borderline |
Key finding: 7 of 8 participants completed the task successfully. Two participants initially looked for the bulk-select checkbox in the row hover state (like Gmail) rather than the header row. After finding the header checkbox, both completed the task quickly. The one failure was a participant who did not notice the toolbar appeared after selecting items.
Verdict (Example)
| Field | Details |
|---|---|
| Overall Result | Pass with modifications |
| Recommendation | Build with two design changes: (1) Add row-level checkboxes on hover for discoverability, (2) Add a subtle animation when the toolbar first appears to draw attention. |
| Remaining Risks | Untested on mobile. The prototype only covered desktop. Add mobile validation before launch if mobile usage is >10% for power users. |
Key Takeaways
- Write the hypothesis before choosing the method. The hypothesis determines what you need to measure, which determines which method produces the right evidence. Working backwards from a method you like leads to weak experiments.
- Set pass/fail criteria before running the test. Deciding what "success" looks like after seeing the data introduces confirmation bias. If your threshold was 70% and you got 68%, that is a fail. Do not move the goalposts.
- Match the method to the risk. A one-sprint feature can be validated with a 2-day painted door test. A 6-sprint feature with irreversible architecture changes deserves a 2-week prototype study. Use the RICE framework to quantify the stakes.
- Five participants is enough for qualitative usability validation. Nielsen Norman Group research consistently shows that 5 users surface ~85% of usability problems. For quantitative tests (conversion rates, click-through rates), you need 100+ data points for statistical significance.
- "Inconclusive" means your experiment was not sharp enough. It does not mean the feature is safe to build. Redesign the test with clearer tasks, better pass/fail criteria, or a different method.
- Document every validation, including the ones that kill features. A validated "do not build" decision saves the team from revisiting the same bad idea six months later. Store findings in your research repository.
About This Template
Created by: Tim Adair
Last Updated: 3/5/2026
Version: 1.0.0
License: Free for personal and commercial use
