What This Template Is For
AI costs scale differently from traditional software costs. A feature that costs $50/month during beta can cost $50,000/month at scale if token usage is not tracked and optimized. Most teams discover cost problems too late because they track API spend as a single line item rather than breaking it down by feature, model, and usage pattern.
This template provides a structured cost model for LLM-powered products. It helps you track token usage by feature, compare model costs, identify optimization opportunities, and forecast monthly budgets. The goal is to make AI costs as visible and manageable as any other infrastructure cost.
Use the LLM Cost Estimator to model costs for different model and usage scenarios interactively. The AI ROI Calculator helps you validate that the business value justifies the AI costs. The AI PM Handbook covers AI cost strategy and optimization in depth.
How to Use This Template
- Inventory your AI features and map each one to the model it uses, the average tokens per request, and the current request volume.
- Calculate baseline costs using current pricing. This is your starting point for optimization.
- Identify optimization opportunities using the optimization checklist. Caching, prompt compression, model routing, and batching can reduce costs by 30-70%.
- Build a forecast model that projects costs at 2x, 5x, and 10x current volume. If costs are unsustainable at scale, optimize now.
- Set up cost monitoring with alerts for spending spikes. A runaway prompt loop or unexpected traffic spike can burn through budgets overnight.
- Review monthly and adjust the model selection, caching strategy, and budget allocation based on actual spend.
The Template
Feature Cost Inventory
- ☐ List every AI feature with its model, average tokens, and request volume
- ☐ Calculate current monthly cost per feature
- ☐ Identify the top 3 features by cost
- ☐ Flag features where cost per request exceeds target
## AI Feature Cost Inventory
### Feature Breakdown
| Feature | Model | Avg Input Tokens | Avg Output Tokens | Daily Requests | Cost/Request | Monthly Cost |
|---------|-------|-----------------|-------------------|----------------|-------------|-------------|
| [Feature 1] | [Model name] | [N] | [M] | [K] | [$X.XXX] | [$Y,YYY] |
| [Feature 2] | [Model name] | [N] | [M] | [K] | [$X.XXX] | [$Y,YYY] |
| [Feature 3] | [Model name] | [N] | [M] | [K] | [$X.XXX] | [$Y,YYY] |
| **Total** | | | | | | **[$Z,ZZZ]** |
### Cost Per User Metrics
- **Total AI cost / MAU**: [$X.XX]
- **AI cost as % of revenue per user**: [X%]
- **Target AI cost per user**: [$X.XX]
- **Cost efficiency trend**: [Improving / Stable / Degrading]
Model Pricing Comparison
- ☐ List candidate models with current pricing
- ☐ Calculate cost per feature for each model option
- ☐ Identify cost savings from model switching (quality vs. cost tradeoff)
- ☐ Document model routing strategy (which model for which task)
## Model Pricing Comparison
### Current Model Pricing (as of [YYYY-MM-DD])
| Model | Input $/1M tokens | Output $/1M tokens | Context Window | Best For |
|-------|-------------------|-------------------|----------------|----------|
| [Model A] | [$X.XX] | [$Y.YY] | [N tokens] | [Complex reasoning] |
| [Model B] | [$X.XX] | [$Y.YY] | [N tokens] | [General tasks] |
| [Model C] | [$X.XX] | [$Y.YY] | [N tokens] | [Simple classification] |
### Cost Comparison by Feature
| Feature | Current Model (Cost) | Alternative A (Cost) | Alternative B (Cost) | Quality Impact |
|---------|---------------------|---------------------|---------------------|----------------|
| [Feature 1] | [$X/mo] | [$Y/mo] | [$Z/mo] | [Acceptable / Degraded] |
| [Feature 2] | [$X/mo] | [$Y/mo] | [$Z/mo] | [Acceptable / Degraded] |
### Model Routing Strategy
| Request Type | Routed To | Rationale |
|-------------|-----------|-----------|
| [Simple queries] | [Cheaper model] | [Quality sufficient, 80% cost savings] |
| [Complex queries] | [Capable model] | [Quality requires top-tier model] |
| [Safety-critical] | [Capable model] | [Cannot risk quality degradation] |
Optimization Checklist
- ☐ Prompt compression: Shorten system prompts without losing quality. Track token savings.
- ☐ Response caching: Cache identical or similar requests. Measure cache hit rate.
- ☐ Model routing: Route simple requests to cheaper models. Track routing decisions.
- ☐ Batching: Batch multiple small requests into single API calls where possible.
- ☐ Output length limits: Set max_tokens to prevent runaway generation.
- ☐ Streaming: Use streaming to reduce time-to-first-token without extra cost.
- ☐ Few-shot reduction: Reduce few-shot examples in prompts if zero-shot works.
- ☐ RAG optimization: Reduce context window by improving retrieval precision.
## Optimization Log
| Date | Optimization | Tokens Saved/Request | Monthly Savings | Quality Impact |
|------|-------------|---------------------|-----------------|----------------|
| [Date] | [What you changed] | [N tokens] | [$X] | [None / Minimal / Acceptable] |
| [Date] | [What you changed] | [N tokens] | [$X] | [None / Minimal / Acceptable] |
### Cache Performance
- **Cache hit rate**: [X%]
- **Monthly cost avoided by caching**: [$Y]
- **Cache storage cost**: [$Z]
- **Net savings**: [$Y - $Z]
Budget Forecast
- ☐ Project costs at current growth rate for next 3 months
- ☐ Model costs at 2x, 5x, and 10x current volume
- ☐ Identify the volume threshold where costs become unsustainable
- ☐ Set monthly budget alerts and spending caps
## Budget Forecast
### Growth Projections
| Scenario | Daily Requests | Monthly Cost | Cost/User | Sustainable? |
|----------|---------------|-------------|-----------|-------------|
| Current | [N] | [$X,XXX] | [$X.XX] | [Yes/No] |
| 2x growth | [2N] | [$Y,YYY] | [$Y.YY] | [Yes/No] |
| 5x growth | [5N] | [$Z,ZZZ] | [$Z.ZZ] | [Yes/No] |
| 10x growth | [10N] | [$W,WWW] | [$W.WW] | [Yes/No] |
### Budget Controls
- **Monthly budget cap**: [$X,XXX]
- **Alert at**: [80% of budget]
- **Auto-throttle at**: [95% of budget]
- **Emergency contact**: [Name, for budget override approval]
Cost Monitoring
- ☐ Set up daily cost tracking dashboard
- ☐ Configure alerts for cost spikes (>2x daily average)
- ☐ Monitor token cost per interaction as a key metric
- ☐ Track cost per feature over time
- ☐ Review cost trends weekly with engineering
Filled Example
Product: AI writing assistant with three features: autocomplete, document summarization, and rewrite suggestions.
Feature Cost Inventory:
| Feature | Model | Avg Input | Avg Output | Daily Reqs | Cost/Req | Monthly Cost |
|---|---|---|---|---|---|---|
| Autocomplete | GPT-4o-mini | 800 | 50 | 45,000 | $0.0004 | $540 |
| Summarization | Claude 3.5 Sonnet | 3,500 | 400 | 8,000 | $0.0144 | $3,456 |
| Rewrite | Claude 3.5 Sonnet | 1,200 | 800 | 12,000 | $0.0096 | $3,456 |
| Total | 65,000 | $7,452 |
Optimization Applied:
- Cached autocomplete responses for repeated patterns. Cache hit rate: 35%. Savings: $189/mo.
- Routed simple rewrites (< 200 words) to GPT-4o-mini. 40% of rewrite traffic rerouted. Savings: $1,175/mo.
- Compressed summarization prompts from 800 to 450 tokens. Savings: $276/mo.
Total monthly savings: $1,640/mo (22% reduction). New monthly cost: $5,812.
