Caching is the single most effective performance optimization available to most product teams. A well-placed cache can reduce database load by 90%, cut API latency from 200ms to 5ms, and handle 10x traffic without adding a single server. But caching done poorly introduces a class of bugs that are difficult to reproduce and even harder to debug: stale data, cache stampedes, inconsistent reads, and silent failures that only surface at scale.
This template provides a structured approach to designing a caching strategy. It covers cache tiers (CDN, application, database), invalidation policies (TTL, event-driven, manual), key design, and monitoring. The goal is to make caching decisions explicit and reviewable instead of ad-hoc and scattered across codebases.
For related infrastructure planning, use the CDN Optimization Template for edge caching decisions and the API Performance Template to track the latency improvements caching delivers. The Technical PM Handbook covers how PMs should evaluate the trade-off between data freshness and performance. The glossary entry on technical debt explains why poorly maintained caches become a major source of hidden complexity.
When to Use This Template
You are designing a new feature and need to decide what to cache and where
Database queries are the bottleneck and you need to reduce read load
An existing cache layer has grown organically and needs rationalization
You are experiencing stale data bugs related to cache invalidation
A scaling milestone (10x users, new geographic region) requires rethinking your cache architecture
How to Use This Template
Start with the Data Inventory. List every data type in your system and its freshness requirements.
Assign cache tiers. Each data type should be cached at exactly one layer (or none if real-time freshness is required).
Define invalidation policies. Every cached item must have a clear invalidation trigger.
Design cache keys. Consistent key naming prevents duplication and makes debugging easier.
Set monitoring targets. Cache hit rate, eviction rate, and memory usage are your primary health signals.
Document failure modes. What happens when the cache goes down? Every cache tier needs a fallback behavior.
The Template
Data Inventory
Data Type
Read Frequency
Write Frequency
Freshness Tolerance
Current Source
Avg Response
[e.g., User profile]
[X reads/sec]
[X writes/hour]
[Real-time / 30s / 5 min / 1 hr]
[Postgres]
[X ms]
[e.g., Product listing]
[X reads/sec]
[X writes/day]
[Real-time / 30s / 5 min / 1 hr]
[Postgres]
[X ms]
Cache Tier Assignment
Data Type
Cache Tier
Technology
TTL
Invalidation Method
Fallback
[data type]
CDN / Application / Database / None
[CloudFront / Redis / Memcached / Query cache]
[X seconds]
[TTL / Event / Manual]
[Pass-through to origin]
Cache tier definitions.
CDN (Edge). Static assets, public API responses, HTML pages. Cached closest to the user. High hit rate, long TTL. Invalidation is slow (propagation delay).
Application (In-Memory or Redis). Session data, user preferences, computed aggregations. Cached in the application layer. Fast reads, flexible invalidation.
Database (Query Cache). Expensive query results, materialized views. Cached at the database layer. Reduces query execution time but does not reduce connection load.
None. Data that must always be real-time (account balances, inventory counts during checkout, auth tokens). Never cache these.
Invalidation Policies
Data Type
Invalidation Trigger
Method
Delay
Consistency Model
[data type]
[On write / On schedule / On TTL expiry]
[Delete key / Publish event / Purge CDN]
[Immediate / <5s / <1 min]
[Eventual / Strong]
Invalidation methods.
TTL (Time-to-Live). Simplest approach. Data expires after a fixed duration. No active invalidation needed. Best for data with predictable freshness tolerance. Risk: data is stale up to the full TTL duration.
Event-driven. Publish a cache invalidation event when the source data changes. Subscribers delete or update the cached value. Best for data that must be fresh within seconds. Risk: event delivery failures leave stale data.
Write-through. Update the cache and the source simultaneously on every write. Best for data with high read-to-write ratio and strong consistency requirements. Risk: write latency increases because two systems must be updated.
Manual purge. Human-initiated cache clear via admin tool or CLI. Best for emergency fixes and content updates. Risk: human error, incomplete purges.
Include a version suffix (:v1, :v2) so you can deploy schema changes without invalidating the entire cache.
Avoid embedding timestamps or request-specific data in keys (creates cache fragmentation).
Use consistent hashing for key distribution across cache nodes.
Capacity Planning
Cache Tier
Max Memory
Current Usage
Eviction Policy
Avg Object Size
Est. Object Count
[Redis cluster]
[X GB]
[X GB (Y%)]
[LRU / LFU / TTL]
[X KB]
[X objects]
Monitoring Targets
Metric
Target
Alert Threshold
Current
Cache hit rate
[>90%]
[<80%]
[X%]
Cache miss latency (origin fetch)
[<200ms]
[>500ms]
[X ms]
Eviction rate
[<1% of total keys/hr]
[>5%/hr]
[X%]
Memory utilization
[<75%]
[>85%]
[X%]
Cache error rate
[<0.01%]
[>0.1%]
[X%]
Failure Modes
Scenario
Impact
Mitigation
Recovery
Cache node failure
[Cache miss storm hits origin DB]
[Read replica, circuit breaker, fallback to stale]
[Auto-failover to replica node]
Cache full (eviction storm)
[Increased latency, higher DB load]
[Scale cache, review TTLs, purge cold data]
[Add memory, lower TTL on low-value data]
Thundering herd (popular key expires)
[Hundreds of simultaneous origin fetches]
[Cache stampede protection: lock + single-flight]
[Automatic once first request populates cache]
Stale data served after write
[User sees outdated information]
[Event-driven invalidation, shorter TTL]
[Manual purge, then fix invalidation path]
Filled Example: E-Commerce Product Catalog
Data Inventory
Data Type
Read Freq
Write Freq
Freshness Tolerance
Source
Avg Response
Product details (name, description, images)
12,000/sec
50/day
5 minutes
Postgres
45ms
Product price
12,000/sec
200/day
30 seconds
Postgres
12ms
Product inventory count
8,000/sec
5,000/hour
Real-time during checkout, 60s on browse
Postgres
8ms
Category listings (product IDs per category)
3,000/sec
20/day
10 minutes
Postgres (aggregation)
180ms
Search results
2,000/sec
N/A (index rebuild hourly)
1 hour
Elasticsearch
120ms
User cart
1,500/sec
1,500/sec
Real-time
Redis (primary store)
2ms
Cache Tier Assignment
Data Type
Tier
Tech
TTL
Invalidation
Fallback
Product details
CDN + Application
CloudFront (5 min) + Redis (5 min)
300s
Event on product update
Pass-through to Postgres
Product price
Application
Redis
30s
Event on price change
Pass-through to Postgres
Inventory count (browse)
Application
Redis
60s
TTL only
Pass-through to Postgres
Inventory count (checkout)
None
N/A
N/A
N/A
Direct Postgres read
Category listings
CDN + Application
CloudFront (10 min) + Redis (10 min)
600s
Event on category change
Pass-through to Postgres
Search results
Application
Redis
3600s
Rebuild on index refresh
Pass-through to Elasticsearch
User cart
None (Redis is primary)
Redis
7 days (session expiry)
N/A
N/A (Redis is source of truth)
Cache Key Design
Data Type
Key Pattern
Example
Product details
catalog:product:{id}:v3
catalog:product:98765:v3
Product price
catalog:price:{id}:{currency}
catalog:price:98765:usd
Inventory count
catalog:stock:{id}:{warehouse}
catalog:stock:98765:us-east
Category listing
catalog:category:{slug}:page:{n}
catalog:category:electronics:page:1
Search results
search:{hash(query+filters)}
search:a3f8b2c1
User cart
cart:{user_id}
cart:user_42891
Monitoring Targets
Metric
Target
Alert
Current
Product cache hit rate
>95%
<85%
93%
Category cache hit rate
>98%
<90%
97%
Search cache hit rate
>80%
<60%
74%
Redis memory utilization
<70%
>85%
58%
Cache-related error rate
<0.01%
>0.1%
0.003%
Key Takeaways
Cache at the right tier. CDN for public, static-ish content. Redis for user-specific or frequently-changing data. Never cache data that must be real-time during critical flows (checkout inventory, account balances).
Every cached item needs an invalidation strategy. "We will figure out invalidation later" is how stale data bugs enter production. Decide on TTL, event-driven, or write-through before writing the caching code.
Design cache keys deliberately. Consistent naming conventions make debugging possible. Include version suffixes so schema changes do not require full cache flushes.
Plan for cache failure. Your system must function (at degraded performance) when the cache is unavailable. If the cache is a single point of failure, it is architecture debt.
Monitor hit rate as the primary health metric. A cache with a 60% hit rate is wasting memory and adding latency for 40% of requests (the origin fetch plus cache write overhead). Target 90%+ for frequently accessed data.
Frequently Asked Questions
What cache hit rate should I target?+
For product catalog data, 90-95% is a good target. For user session data, 95%+ is typical because sessions are read far more than they are written. For search results, 60-80% is realistic because query diversity is high. A hit rate below 50% suggests the data is not a good caching candidate, the TTL is too short, or the key design is creating too many unique entries.
When should I use Redis vs Memcached?+
Use Redis when you need data structures beyond simple key-value (sorted sets, lists, hashes, pub/sub) or when you need persistence. Use Memcached for pure key-value caching where simplicity and multi-threaded performance matter. For most product teams, Redis is the default choice because its data structure support enables more caching patterns.
How do I prevent cache stampedes?+
A cache stampede occurs when a popular key expires and hundreds of requests simultaneously hit the origin. Three mitigations: (1) Add jitter to TTLs so keys do not all expire at the same time. (2) Use a lock or single-flight pattern so only one request fetches from origin while others wait for the cache to be repopulated. (3) Use "stale-while-revalidate" to serve the expired value while a background process refreshes it.
Should PMs care about caching decisions?+
PMs should care about the user-facing trade-offs. Caching makes data slightly stale. If a user updates their profile and the change takes 30 seconds to appear, that is a product decision, not just a technical one. PMs should define freshness requirements per data type (this template's Data Inventory section) and let engineering choose the caching implementation.
How do I handle cache warming after a deployment?+
Cache warming pre-populates the cache before traffic hits cold instances. Run a warmup script that fetches the top 1,000 most-accessed keys from the origin and writes them to the cache. Alternatively, route a small percentage of traffic to new instances for 2-3 minutes before shifting full load. The [Load Testing Template](/templates) includes a warmup phase configuration for this purpose.
Explore More Templates
Browse our full library of PM templates, or generate a custom version with AI.