What This Template Is For
A message queue sits between producers and consumers, buffering work so that the producer does not need to wait for the consumer to finish processing. This decoupling is the foundation of reliable async processing. Without a queue, a spike in traffic overwhelms downstream services directly. With a queue, the spike is absorbed, and consumers process work at their own pace.
This template helps engineering and product teams design a message queue system. It covers queue topology, message routing, consumer scaling, backpressure management, retry and dead letter strategies, and operational monitoring. Whether you are introducing queues for the first time or redesigning an existing system, this template ensures you address every critical design decision before writing code.
For the broader context of how asynchronous patterns fit into product architecture, the Technical PM Handbook covers engineering collaboration for infrastructure decisions. If your queue design involves evaluating technologies (RabbitMQ vs SQS vs Kafka), document that choice in an architecture decision record. Teams working with microservices will find queues essential for inter-service communication. For understanding the performance characteristics of your queue system, the technical spec template provides a complementary structure.
How to Use This Template
- Start by identifying every producer and consumer in your system. List what messages each produces and what processing each consumer performs.
- Choose your queue topology: point-to-point (one consumer per message), fan-out (multiple consumers per message), or topic-based routing. The right topology depends on your use case.
- Define message schemas before implementation. Include an envelope with metadata (messageId, timestamp, type, correlationId) and a typed payload.
- Set delivery guarantees per queue. Not every queue needs exactly-once semantics. Analytics queues can tolerate at-most-once. Payment queues need at-least-once with idempotent consumers.
- Design your error handling strategy. Every message that cannot be processed needs a destination: retry queue, dead letter queue, or alert.
- Plan capacity and scaling. How many messages per second at peak? How many consumers do you need to keep up? What happens when consumers fall behind?
The Template
System Overview
| Field | Details |
|---|---|
| System Name | [Name of the queue system or domain] |
| Author | [Name] |
| Date | [Date] |
| Status | Draft / In Review / Approved |
| Queue Technology | [e.g., RabbitMQ, AWS SQS, Redis Streams, Kafka, NATS JetStream] |
| Related Docs | [Links to ADRs, tech specs, PRDs] |
Business Context
[2-3 paragraphs describing:]
- [What business problem the queue system solves]
- [Current architecture and its limitations (e.g., synchronous bottlenecks, cascading failures)]
- [Expected message volume: messages/second at steady state and peak]
- [Latency requirements: how quickly must messages be processed?]
Queue Topology
| Queue Name | Type | Producers | Consumers | Messages/sec (peak) | Retention |
|---|---|---|---|---|---|
[e.g., notifications.email] | Point-to-point | [Service name] | [Service name] | [number] | [e.g., 7 days] |
[e.g., orders.processing] | Point-to-point | [Service name] | [Service name] | [number] | [e.g., 14 days] |
[e.g., events.analytics] | Fan-out | [Multiple services] | [Analytics, Reporting] | [number] | [e.g., 30 days] |
Topology type: [Point-to-point / Fan-out / Topic-based routing / Combination]
Routing strategy: [Direct queue names / Topic exchange with routing keys / Content-based routing / Header-based routing]
Message Schema
Envelope (common to all messages):
{
"messageId": "uuid-v4 (unique per message)",
"type": "[domain].[action] (e.g., notification.send_email)",
"version": "1.0",
"timestamp": "ISO-8601",
"producer": "[service-name]",
"correlationId": "uuid-v4 (traces business transaction)",
"priority": "[0-9, where 9 is highest]",
"payload": {
// Message-specific data
}
}
Message types and payloads:
| Message Type | Payload Fields | Size (avg) | Priority |
|---|---|---|---|
[e.g., notification.send_email] | [to, subject, template, variables] | [~2 KB] | [5] |
[e.g., order.process_payment] | [orderId, amount, paymentMethod] | [~1 KB] | [9] |
[e.g., report.generate] | [reportType, dateRange, userId] | [~500 B] | [3] |
Consumer Design
| Consumer | Queue | Concurrency | Processing Time (P50 / P99) | Idempotent? | Scaling Strategy |
|---|---|---|---|---|---|
| [Service name] | [Queue name] | [e.g., 5 workers] | [e.g., 50ms / 500ms] | Yes / No | [Horizontal / Vertical] |
| [Service name] | [Queue name] | [e.g., 10 workers] | [e.g., 200ms / 2s] | Yes / No | [Horizontal / Vertical] |
Acknowledgment strategy:
- [When does the consumer ack? After processing? After writing to database?]
- [What happens on consumer crash before ack? (Message redelivered)]
- [Visibility timeout / ack deadline: e.g., 30 seconds]
Consumer group management:
- [How are consumers grouped?]
- [How is work distributed across consumers? (Round-robin, consistent hashing, partition assignment)]
- [How do you add/remove consumers without message loss?]
Delivery Guarantees
| Queue | Guarantee | Idempotency Strategy |
|---|---|---|
| [Queue name] | At-least-once | [e.g., Dedup by messageId in Redis with 24h TTL] |
| [Queue name] | At-most-once | [N/A, acceptable for analytics] |
| [Queue name] | At-least-once | [e.g., Database upsert with messageId as unique constraint] |
Backpressure and Rate Limiting
- Queue depth threshold: [e.g., Alert at 10,000 messages, scale consumers at 50,000]
- Producer rate limiting: [e.g., Max 1,000 messages/sec per producer, reject with 429 if exceeded]
- Consumer rate limiting: [e.g., Max 100 external API calls/sec, use token bucket]
- Auto-scaling rules: [e.g., Add 1 consumer per 5,000 queue depth, max 20 consumers]
- Circuit breaker: [e.g., If downstream service returns 5 consecutive 5xx, pause consumption for 30s]
Retry Strategy
| Retry Level | Delay | Max Attempts | On Failure |
|---|---|---|---|
| Immediate retry | 0 | 1 | Move to delayed retry |
| Delayed retry (level 1) | 5 seconds | 1 | Move to level 2 |
| Delayed retry (level 2) | 30 seconds | 1 | Move to level 3 |
| Delayed retry (level 3) | 5 minutes | 1 | Move to DLQ |
Retry classification:
- Retryable errors: [Timeouts, 5xx, connection refused, resource locked]
- Non-retryable errors: [Validation failure, 4xx, malformed message, business rule violation]
- Non-retryable errors skip directly to DLQ
Dead Letter Queue (DLQ)
| Source Queue | DLQ Name | Retention | Alert | Investigation SLA |
|---|---|---|---|---|
| [Queue name] | [e.g., notifications.email.dlq] | [14 days] | [PagerDuty / Slack] | [1 hour / 4 hours] |
| [Queue name] | [DLQ name] | [retention] | [alert channel] | [SLA] |
DLQ processing workflow:
- Alert fires when a message enters any DLQ
- On-call engineer inspects the message payload and error reason
- If the root cause is fixed, replay the message from DLQ to the source queue
- If the message is permanently invalid, archive it with a reason and close the alert
DLQ replay tool: [e.g., CLI tool, admin dashboard, automated replay after deployment]
Monitoring and Alerting
| Metric | Description | Warning | Critical |
|---|---|---|---|
| Queue depth | Messages waiting to be consumed | > [X] | > [Y] |
| Consumer lag | Rate of production minus rate of consumption | Growing for > 5 min | Growing for > 15 min |
| Processing latency (P99) | Time from enqueue to consumer ack | > [X]ms | > [Y]ms |
| Error rate | Failed processing / total processed | > 1% | > 5% |
| DLQ count | Messages in dead letter queues | > 0 | > 10 |
| Consumer health | Heartbeat / liveness check | 1 consumer down | > 50% consumers down |
Dashboard: [Link to Grafana/Datadog/CloudWatch dashboard]
On-call runbook: [Link to runbook for queue-related incidents]
Capacity Planning
| Metric | Current | 6-Month Target | 12-Month Target |
|---|---|---|---|
| Messages/second (peak) | [number] | [number] | [number] |
| Average message size | [bytes] | [bytes] | [bytes] |
| Storage required | [GB] | [GB] | [GB] |
| Consumer instances | [count] | [count] | [count] |
| End-to-end latency (P99) | [ms] | [ms] | [ms] |
Security
- Encryption in transit: [e.g., TLS 1.3 for all connections]
- Encryption at rest: [e.g., AES-256 for stored messages]
- Access control: [e.g., IAM roles per service, no shared credentials]
- PII handling: [e.g., No PII in message payloads, use reference IDs]
- Audit logging: [e.g., Log all queue management operations (create, delete, purge)]
Filled Example: Async Notification Pipeline
System Overview
| Field | Details |
|---|---|
| System Name | Notification Pipeline |
| Author | Jordan Rivera, Backend Engineer |
| Date | March 2026 |
| Status | Approved |
| Queue Technology | AWS SQS (Standard Queues) + SNS (for fan-out) |
| Related Docs | ADR-044 (SQS over RabbitMQ), PRD-2026-031 (Multi-channel Notifications) |
Business Context
NotifyApp sends 2 million notifications per day across email, push, SMS, and in-app channels. The current system processes notifications synchronously during API request handling. When the email provider is slow (P99: 3 seconds), it blocks the API response, causing timeouts for end users. During marketing campaigns, email volume spikes 10x, overwhelming the email service and causing cascading failures in the core API.
The queue system decouples notification sending from the triggering event. The API enqueues a notification request in under 10ms and returns immediately. Dedicated consumers for each channel (email, push, SMS, in-app) process notifications at their own pace, independently scaling to match demand.
Queue Topology
| Queue Name | Type | Producers | Consumers | Messages/sec (peak) | Retention |
|---|---|---|---|---|---|
notifications.email | Point-to-point | Notification Router | Email Worker | 500 | 7 days |
notifications.push | Point-to-point | Notification Router | Push Worker | 300 | 3 days |
notifications.sms | Point-to-point | Notification Router | SMS Worker | 50 | 7 days |
notifications.in_app | Point-to-point | Notification Router | In-App Worker | 200 | 3 days |
A single SNS topic (notifications.dispatch) fans out to all four SQS queues. The Notification Router publishes once, and each channel queue receives a copy filtered by message attributes.
Consumer Design
| Consumer | Queue | Concurrency | Processing Time (P50 / P99) | Idempotent? | Scaling Strategy |
|---|---|---|---|---|---|
| Email Worker | notifications.email | 20 workers | 200ms / 2s | Yes (dedup by notificationId + channel) | Horizontal, auto-scale on queue depth |
| Push Worker | notifications.push | 10 workers | 50ms / 300ms | Yes (dedup by notificationId + channel) | Horizontal |
| SMS Worker | notifications.sms | 5 workers | 100ms / 1s | Yes (dedup by notificationId + channel) | Fixed (SMS rate-limited by provider) |
| In-App Worker | notifications.in_app | 10 workers | 20ms / 100ms | Yes (database upsert) | Horizontal |
Common Mistakes to Avoid
- Using a queue when a synchronous call would be simpler. If you need a response and the downstream service is fast and reliable, a direct HTTP call is easier to reason about. Queues add operational complexity. Only use them when you need decoupling, buffering, or async processing.
- No visibility timeout tuning. If your consumer takes 30 seconds to process a message but the visibility timeout is 15 seconds, the message becomes visible again and gets processed twice. Set the visibility timeout to at least 2x your P99 processing time.
- Ignoring poison messages. A malformed message that fails every retry will block the queue if you do not have a max retry count and DLQ. One bad message should not block thousands of good ones.
- Treating all messages as equal priority. A password reset email and a marketing campaign email should not compete for the same consumer capacity. Use separate queues or priority levels for different urgency tiers.
- No backpressure strategy. Without backpressure, a producer can flood the queue faster than consumers can drain it. Eventually the queue hits storage limits, and messages are rejected or dropped. Define queue depth alerts and auto-scaling rules before you need them.
Key Takeaways
- Map every producer, consumer, queue, and message type before writing code
- Set delivery guarantees per queue based on business criticality, not uniformly
- Design consumers to be idempotent. Duplicate delivery will happen
- Implement dead letter queues and monitoring before going to production
- Plan for backpressure with queue depth alerts, auto-scaling rules, and circuit breakers
About This Template
Created by: Tim Adair
Last Updated: 3/5/2026
Version: 1.0.0
License: Free for personal and commercial use
