What This Template Is For
Capacity planning is the process of matching infrastructure resources to expected demand before you run out of headroom. Without a written plan, teams discover capacity limits during traffic spikes, scale reactively at premium cost, and over-provision "just in case" with no data to justify the spend.
This template structures the conversation between product, engineering, and finance. It forces you to document current utilization, forecast growth, define scaling triggers, and estimate cost. The result is a plan that prevents outages and controls spend at the same time.
Use this template before a major product launch, at the start of each quarter during infrastructure planning, or after a capacity-related incident. If you need to evaluate the financial impact of scaling decisions, the TAM calculator can help size the market opportunity that justifies the infrastructure investment. For a broader view of how infrastructure planning fits into PM work, see the Technical PM Handbook.
How to Use This Template
- Gather baseline metrics for all critical resources: compute, storage, database connections, network bandwidth, and queue depth.
- Document current utilization as a percentage of maximum capacity. If you do not know the maximum, run a load test first.
- Project growth using historical data and product roadmap inputs. Work with the PM to identify planned features or launches that will change traffic patterns.
- Set scaling triggers with specific thresholds that initiate either automatic or manual scaling actions.
- Calculate cost projections for each scaling scenario. Include both the infrastructure cost and the engineering time required to implement the change.
- Review with engineering, product, and finance. The PM validates the growth assumptions. Finance approves the budget. Engineering owns the execution.
The Template
Plan Overview
| Field | Details |
|---|---|
| System/Service | [Name of system or service being planned] |
| Owner | [Team or individual] |
| Planning Horizon | [e.g., Q2 2026 / Next 6 months] |
| Date | [Date] |
| Status | Draft / Reviewed / Approved |
| Last Capacity Review | [Date of previous review] |
Current Resource Utilization
| Resource | Current Usage | Maximum Capacity | Utilization % | Headroom |
|---|---|---|---|---|
| CPU (compute) | [e.g., 340 cores] | [e.g., 500 cores] | [e.g., 68%] | [e.g., 160 cores] |
| Memory | [e.g., 256 GB] | [e.g., 384 GB] | [e.g., 67%] | [e.g., 128 GB] |
| Storage | [e.g., 2.1 TB] | [e.g., 5 TB] | [e.g., 42%] | [e.g., 2.9 TB] |
| Database connections | [e.g., 180] | [e.g., 300] | [e.g., 60%] | [e.g., 120] |
| Network bandwidth | [e.g., 4 Gbps] | [e.g., 10 Gbps] | [e.g., 40%] | [e.g., 6 Gbps] |
| Queue depth (peak) | [e.g., 5K messages] | [e.g., 50K messages] | [e.g., 10%] | [e.g., 45K] |
Bottleneck analysis. [Which resource will hit capacity first at current growth rates? What is the estimated time to exhaustion?]
Growth Projections
| Metric | Current | +3 Months | +6 Months | +12 Months | Assumptions |
|---|---|---|---|---|---|
| Daily active users | [Value] | [Value] | [Value] | [Value] | [Source] |
| Requests per second (peak) | [Value] | [Value] | [Value] | [Value] | [Source] |
| Data storage growth | [Value] | [Value] | [Value] | [Value] | [Rate/month] |
| Background job volume | [Value] | [Value] | [Value] | [Value] | [Source] |
Growth drivers.
- ☐ Organic user growth ([rate]% month-over-month)
- ☐ Planned product launch: [feature name, expected date, estimated traffic impact]
- ☐ Marketing campaign: [campaign name, expected date, estimated traffic spike]
- ☐ Seasonal pattern: [description of seasonal traffic changes]
- ☐ Customer expansion: [large customer onboarding, estimated resource impact]
Scaling Triggers and Actions
| Resource | Warning Threshold | Critical Threshold | Scaling Action | Lead Time |
|---|---|---|---|---|
| CPU | [e.g., 70%] | [e.g., 85%] | [e.g., Add 2 nodes to ASG] | [e.g., 5 min auto] |
| Memory | [e.g., 75%] | [e.g., 90%] | [e.g., Increase instance size] | [e.g., 15 min] |
| Storage | [e.g., 70%] | [e.g., 85%] | [e.g., Expand volume] | [e.g., 30 min] |
| Database | [e.g., 65%] | [e.g., 80%] | [e.g., Add read replica] | [e.g., 2 hours] |
| Network | [e.g., 60%] | [e.g., 80%] | [e.g., Enable CDN / upgrade tier] | [e.g., 1 day] |
Cost Projections
| Scenario | Monthly Cost (Current) | Monthly Cost (+6 Mo) | Monthly Cost (+12 Mo) | Annual Delta |
|---|---|---|---|---|
| Baseline (no action) | $[Amount] | $[Amount] | $[Amount] | $[Amount] |
| Scale vertically | $[Amount] | $[Amount] | $[Amount] | $[Amount] |
| Scale horizontally | $[Amount] | $[Amount] | $[Amount] | $[Amount] |
| Re-architect | $[Amount] | $[Amount] | $[Amount] | $[Amount] |
Cost optimization opportunities.
- ☐ Right-size underutilized instances ([estimated savings])
- ☐ Purchase reserved instances for predictable workloads ([estimated savings])
- ☐ Implement caching layer to reduce database load ([estimated savings])
- ☐ Archive cold data to cheaper storage tier ([estimated savings])
- ☐ Review and eliminate unused resources ([estimated savings])
Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| [Traffic spike exceeds projections] | [High/Med/Low] | [Service degradation] | [Auto-scaling policy, CDN, rate limiting] |
| [Database hits connection limit] | [High/Med/Low] | [Request failures] | [Connection pooling, read replicas] |
| [Storage exhaustion] | [High/Med/Low] | [Write failures] | [Automated volume expansion, data archival] |
| [Vendor outage] | [High/Med/Low] | [Service unavailable] | [Multi-region failover, circuit breakers] |
Action Items
- ☐ [Action 1: description, owner, deadline]
- ☐ [Action 2: description, owner, deadline]
- ☐ [Action 3: description, owner, deadline]
- ☐ [Action 4: description, owner, deadline]
- ☐ [Action 5: description, owner, deadline]
Filled Example: SaaS Analytics Platform
Plan Overview
| Field | Details |
|---|---|
| System/Service | Analytics Ingestion Pipeline |
| Owner | Platform Team (Lead: Jordan Lee) |
| Planning Horizon | Q2-Q3 2026 |
| Date | March 2026 |
| Status | In Review |
| Last Capacity Review | December 2025 |
Current Resource Utilization
| Resource | Current Usage | Maximum Capacity | Utilization % | Headroom |
|---|---|---|---|---|
| CPU (Kafka brokers) | 48 cores | 64 cores | 75% | 16 cores |
| Memory (ClickHouse) | 384 GB | 512 GB | 75% | 128 GB |
| Storage (ClickHouse) | 8.2 TB | 12 TB | 68% | 3.8 TB |
| Database connections (PG) | 210 | 300 | 70% | 90 |
| Network bandwidth | 6.2 Gbps | 10 Gbps | 62% | 3.8 Gbps |
Bottleneck analysis. CPU on Kafka brokers will hit 85% within 8 weeks at current growth (3.2% week-over-week increase in event volume). Database connections will reach the warning threshold in approximately 12 weeks due to a new real-time dashboard feature shipping in April.
Growth Projections
| Metric | Current | +3 Months | +6 Months | +12 Months | Assumptions |
|---|---|---|---|---|---|
| Events/second (peak) | 45K | 62K | 85K | 140K | 3.2% WoW, plus Q3 enterprise launch |
| Storage growth | 8.2 TB | 10.5 TB | 13.8 TB | 21 TB | 800 GB/month current, rising to 1.2 TB |
| Daily active dashboards | 2,400 | 3,200 | 4,500 | 7,000 | New real-time feature drives +40% usage |
Scaling Actions
| Priority | Action | Owner | Cost Impact | Deadline |
|---|---|---|---|---|
| P0 | Add 2 Kafka brokers (32 cores each) | Jordan | +$2,400/mo | April 1 |
| P1 | Implement PgBouncer connection pooling | Backend team | +$200/mo | April 15 |
| P1 | Expand ClickHouse storage to 20 TB | Jordan | +$1,800/mo | May 1 |
| P2 | Evaluate horizontal ClickHouse sharding | Data team | TBD | Q3 planning |
Cost Projections
| Scenario | Monthly (Current) | Monthly (+6 Mo) | Monthly (+12 Mo) | Annual Delta |
|---|---|---|---|---|
| Baseline (no action) | $18,400 | $18,400 | $18,400 | $0 |
| Recommended plan | $18,400 | $22,800 | $28,600 | +$67,200 |
| Over-provision (safe) | $18,400 | $31,200 | $31,200 | +$112,800 |
Key Takeaways
- Measure current utilization before projecting future needs
- Plan for three scenarios: pessimistic, expected, and optimistic
- Set scaling triggers with specific thresholds, not vague guidelines
- Include cost projections alongside technical plans to get finance buy-in
- Review capacity quarterly and after any capacity-related incident
About This Template
Created by: Tim Adair
Last Updated: 3/4/2026
Version: 1.0.0
License: Free for personal and commercial use
