Skip to main content
New: Deck Doctor. Upload your deck, get CPO-level feedback. 7-day free trial.
TemplateFREE⏱️ 60-90 minutes

Infrastructure Capacity Planning Template

A structured capacity planning template for scaling infrastructure. Covers current resource utilization, growth projections, scaling triggers, cost...

Last updated 2026-03-04
Infrastructure Capacity Planning Template preview

Infrastructure Capacity Planning Template

Free Infrastructure Capacity Planning Template — open and start using immediately

or use email

Instant access. No spam.

Get Template Pro — all templates, no gates, premium files

888+ templates without email gates, plus 30 premium Excel spreadsheets with formulas and professional slide decks. One payment, lifetime access.

Need a custom version?

Forge AI generates PM documents customized to your product, team, and goals. Get a draft in seconds, then refine with AI chat.

Generate with Forge AI

What This Template Is For

Capacity planning is the process of matching infrastructure resources to expected demand before you run out of headroom. Without a written plan, teams discover capacity limits during traffic spikes, scale reactively at premium cost, and over-provision "just in case" with no data to justify the spend.

This template structures the conversation between product, engineering, and finance. It forces you to document current utilization, forecast growth, define scaling triggers, and estimate cost. The result is a plan that prevents outages and controls spend at the same time.

Use this template before a major product launch, at the start of each quarter during infrastructure planning, or after a capacity-related incident. If you need to evaluate the financial impact of scaling decisions, the TAM calculator can help size the market opportunity that justifies the infrastructure investment. For a broader view of how infrastructure planning fits into PM work, see the Technical PM Handbook.


How to Use This Template

  1. Gather baseline metrics for all critical resources: compute, storage, database connections, network bandwidth, and queue depth.
  2. Document current utilization as a percentage of maximum capacity. If you do not know the maximum, run a load test first.
  3. Project growth using historical data and product roadmap inputs. Work with the PM to identify planned features or launches that will change traffic patterns.
  4. Set scaling triggers with specific thresholds that initiate either automatic or manual scaling actions.
  5. Calculate cost projections for each scaling scenario. Include both the infrastructure cost and the engineering time required to implement the change.
  6. Review with engineering, product, and finance. The PM validates the growth assumptions. Finance approves the budget. Engineering owns the execution.

The Template

Plan Overview

FieldDetails
System/Service[Name of system or service being planned]
Owner[Team or individual]
Planning Horizon[e.g., Q2 2026 / Next 6 months]
Date[Date]
StatusDraft / Reviewed / Approved
Last Capacity Review[Date of previous review]

Current Resource Utilization

ResourceCurrent UsageMaximum CapacityUtilization %Headroom
CPU (compute)[e.g., 340 cores][e.g., 500 cores][e.g., 68%][e.g., 160 cores]
Memory[e.g., 256 GB][e.g., 384 GB][e.g., 67%][e.g., 128 GB]
Storage[e.g., 2.1 TB][e.g., 5 TB][e.g., 42%][e.g., 2.9 TB]
Database connections[e.g., 180][e.g., 300][e.g., 60%][e.g., 120]
Network bandwidth[e.g., 4 Gbps][e.g., 10 Gbps][e.g., 40%][e.g., 6 Gbps]
Queue depth (peak)[e.g., 5K messages][e.g., 50K messages][e.g., 10%][e.g., 45K]

Bottleneck analysis. [Which resource will hit capacity first at current growth rates? What is the estimated time to exhaustion?]


Growth Projections

MetricCurrent+3 Months+6 Months+12 MonthsAssumptions
Daily active users[Value][Value][Value][Value][Source]
Requests per second (peak)[Value][Value][Value][Value][Source]
Data storage growth[Value][Value][Value][Value][Rate/month]
Background job volume[Value][Value][Value][Value][Source]

Growth drivers.

  • Organic user growth ([rate]% month-over-month)
  • Planned product launch: [feature name, expected date, estimated traffic impact]
  • Marketing campaign: [campaign name, expected date, estimated traffic spike]
  • Seasonal pattern: [description of seasonal traffic changes]
  • Customer expansion: [large customer onboarding, estimated resource impact]

Scaling Triggers and Actions

ResourceWarning ThresholdCritical ThresholdScaling ActionLead Time
CPU[e.g., 70%][e.g., 85%][e.g., Add 2 nodes to ASG][e.g., 5 min auto]
Memory[e.g., 75%][e.g., 90%][e.g., Increase instance size][e.g., 15 min]
Storage[e.g., 70%][e.g., 85%][e.g., Expand volume][e.g., 30 min]
Database[e.g., 65%][e.g., 80%][e.g., Add read replica][e.g., 2 hours]
Network[e.g., 60%][e.g., 80%][e.g., Enable CDN / upgrade tier][e.g., 1 day]

Cost Projections

ScenarioMonthly Cost (Current)Monthly Cost (+6 Mo)Monthly Cost (+12 Mo)Annual Delta
Baseline (no action)$[Amount]$[Amount]$[Amount]$[Amount]
Scale vertically$[Amount]$[Amount]$[Amount]$[Amount]
Scale horizontally$[Amount]$[Amount]$[Amount]$[Amount]
Re-architect$[Amount]$[Amount]$[Amount]$[Amount]

Cost optimization opportunities.

  • Right-size underutilized instances ([estimated savings])
  • Purchase reserved instances for predictable workloads ([estimated savings])
  • Implement caching layer to reduce database load ([estimated savings])
  • Archive cold data to cheaper storage tier ([estimated savings])
  • Review and eliminate unused resources ([estimated savings])

Risk Assessment

RiskLikelihoodImpactMitigation
[Traffic spike exceeds projections][High/Med/Low][Service degradation][Auto-scaling policy, CDN, rate limiting]
[Database hits connection limit][High/Med/Low][Request failures][Connection pooling, read replicas]
[Storage exhaustion][High/Med/Low][Write failures][Automated volume expansion, data archival]
[Vendor outage][High/Med/Low][Service unavailable][Multi-region failover, circuit breakers]

Action Items

  • [Action 1: description, owner, deadline]
  • [Action 2: description, owner, deadline]
  • [Action 3: description, owner, deadline]
  • [Action 4: description, owner, deadline]
  • [Action 5: description, owner, deadline]

Filled Example: SaaS Analytics Platform

Plan Overview

FieldDetails
System/ServiceAnalytics Ingestion Pipeline
OwnerPlatform Team (Lead: Jordan Lee)
Planning HorizonQ2-Q3 2026
DateMarch 2026
StatusIn Review
Last Capacity ReviewDecember 2025

Current Resource Utilization

ResourceCurrent UsageMaximum CapacityUtilization %Headroom
CPU (Kafka brokers)48 cores64 cores75%16 cores
Memory (ClickHouse)384 GB512 GB75%128 GB
Storage (ClickHouse)8.2 TB12 TB68%3.8 TB
Database connections (PG)21030070%90
Network bandwidth6.2 Gbps10 Gbps62%3.8 Gbps

Bottleneck analysis. CPU on Kafka brokers will hit 85% within 8 weeks at current growth (3.2% week-over-week increase in event volume). Database connections will reach the warning threshold in approximately 12 weeks due to a new real-time dashboard feature shipping in April.

Growth Projections

MetricCurrent+3 Months+6 Months+12 MonthsAssumptions
Events/second (peak)45K62K85K140K3.2% WoW, plus Q3 enterprise launch
Storage growth8.2 TB10.5 TB13.8 TB21 TB800 GB/month current, rising to 1.2 TB
Daily active dashboards2,4003,2004,5007,000New real-time feature drives +40% usage

Scaling Actions

PriorityActionOwnerCost ImpactDeadline
P0Add 2 Kafka brokers (32 cores each)Jordan+$2,400/moApril 1
P1Implement PgBouncer connection poolingBackend team+$200/moApril 15
P1Expand ClickHouse storage to 20 TBJordan+$1,800/moMay 1
P2Evaluate horizontal ClickHouse shardingData teamTBDQ3 planning

Cost Projections

ScenarioMonthly (Current)Monthly (+6 Mo)Monthly (+12 Mo)Annual Delta
Baseline (no action)$18,400$18,400$18,400$0
Recommended plan$18,400$22,800$28,600+$67,200
Over-provision (safe)$18,400$31,200$31,200+$112,800

Key Takeaways

  • Measure current utilization before projecting future needs
  • Plan for three scenarios: pessimistic, expected, and optimistic
  • Set scaling triggers with specific thresholds, not vague guidelines
  • Include cost projections alongside technical plans to get finance buy-in
  • Review capacity quarterly and after any capacity-related incident

About This Template

Created by: Tim Adair

Last Updated: 3/4/2026

Version: 1.0.0

License: Free for personal and commercial use

Frequently Asked Questions

How often should we run capacity planning?+
Quarterly for stable services, monthly for high-growth services. Trigger an off-cycle review after any capacity-related incident, before a major product launch, or when growth rate changes by more than 20%. The [Technical PM Handbook](/technical-pm-guide) covers how to integrate capacity reviews into your planning cadence.
What utilization percentage should trigger scaling?+
Start scaling preparations at 70% utilization. Initiate scaling actions at 80%. Never let production workloads exceed 85% sustained utilization. These thresholds give you enough headroom to scale before users are affected. Some resources (databases, queues) need more headroom than stateless compute because they take longer to scale.
How do I forecast growth for a new product with no historical data?+
Use analogues from similar products, industry benchmarks, and conservative estimates. Build three scenarios: pessimistic (50% of target), expected, and optimistic (200% of target). Plan infrastructure for the expected case with the ability to scale to the optimistic case within your lead time window. Track actual versus projected weekly and adjust.
Should we auto-scale or manually scale?+
Auto-scale stateless compute (web servers, workers) where scaling is fast and low-risk. Manually scale stateful systems (databases, queues, caches) where incorrect scaling can cause data issues. Hybrid approaches work well: auto-scale within a pre-approved range, alert humans when auto-scaling hits the upper boundary.
How do I justify capacity spending to finance?+
Calculate the cost of downtime per minute (lost revenue, SLA penalties, customer churn risk) and compare it to the cost of the infrastructure investment. If 10 minutes of downtime costs $50,000 and the scaling investment is $3,000/month, the payback period is measured in days, not months. Use the [TAM calculator](/tools/tam-calculator) to size the revenue at risk. ---

Explore More Templates

Browse our full library of PM templates, or generate a custom version with AI.

Free PDF

Like This Template?

Subscribe to get new templates, frameworks, and PM strategies delivered to your inbox.

or use email

Join 10,000+ product leaders. Instant PDF download.

Want full SaaS idea playbooks with market research?

Explore Ideas Pro →