Skip to main content
New: Deck Doctor. Upload your deck, get CPO-level feedback. 7-day free trial.
TemplateFREE⏱️ 45-60 minutes

Message Queue System Design Template

A structured template for designing message queue systems. Covers queue topology, message routing, consumer patterns, backpressure handling, retry...

By Tim Adair• Last updated 2026-03-05
Message Queue System Design Template preview

Message Queue System Design Template

Free Message Queue System Design Template — open and start using immediately

or use email

Instant access. No spam.

Need a custom version?

Forge AI generates PM documents customized to your product, team, and goals. Get a draft in seconds, then refine with AI chat.

Generate with Forge AI

What This Template Is For

A message queue sits between producers and consumers, buffering work so that the producer does not need to wait for the consumer to finish processing. This decoupling is the foundation of reliable async processing. Without a queue, a spike in traffic overwhelms downstream services directly. With a queue, the spike is absorbed, and consumers process work at their own pace.

This template helps engineering and product teams design a message queue system. It covers queue topology, message routing, consumer scaling, backpressure management, retry and dead letter strategies, and operational monitoring. Whether you are introducing queues for the first time or redesigning an existing system, this template ensures you address every critical design decision before writing code.

For the broader context of how asynchronous patterns fit into product architecture, the Technical PM Handbook covers engineering collaboration for infrastructure decisions. If your queue design involves evaluating technologies (RabbitMQ vs SQS vs Kafka), document that choice in an architecture decision record. Teams working with microservices will find queues essential for inter-service communication. For understanding the performance characteristics of your queue system, the technical spec template provides a complementary structure.


How to Use This Template

  1. Start by identifying every producer and consumer in your system. List what messages each produces and what processing each consumer performs.
  2. Choose your queue topology: point-to-point (one consumer per message), fan-out (multiple consumers per message), or topic-based routing. The right topology depends on your use case.
  3. Define message schemas before implementation. Include an envelope with metadata (messageId, timestamp, type, correlationId) and a typed payload.
  4. Set delivery guarantees per queue. Not every queue needs exactly-once semantics. Analytics queues can tolerate at-most-once. Payment queues need at-least-once with idempotent consumers.
  5. Design your error handling strategy. Every message that cannot be processed needs a destination: retry queue, dead letter queue, or alert.
  6. Plan capacity and scaling. How many messages per second at peak? How many consumers do you need to keep up? What happens when consumers fall behind?

The Template

System Overview

FieldDetails
System Name[Name of the queue system or domain]
Author[Name]
Date[Date]
StatusDraft / In Review / Approved
Queue Technology[e.g., RabbitMQ, AWS SQS, Redis Streams, Kafka, NATS JetStream]
Related Docs[Links to ADRs, tech specs, PRDs]

Business Context

[2-3 paragraphs describing:]

  • [What business problem the queue system solves]
  • [Current architecture and its limitations (e.g., synchronous bottlenecks, cascading failures)]
  • [Expected message volume: messages/second at steady state and peak]
  • [Latency requirements: how quickly must messages be processed?]

Queue Topology

Queue NameTypeProducersConsumersMessages/sec (peak)Retention
[e.g., notifications.email]Point-to-point[Service name][Service name][number][e.g., 7 days]
[e.g., orders.processing]Point-to-point[Service name][Service name][number][e.g., 14 days]
[e.g., events.analytics]Fan-out[Multiple services][Analytics, Reporting][number][e.g., 30 days]

Topology type: [Point-to-point / Fan-out / Topic-based routing / Combination]

Routing strategy: [Direct queue names / Topic exchange with routing keys / Content-based routing / Header-based routing]


Message Schema

Envelope (common to all messages):

{
  "messageId": "uuid-v4 (unique per message)",
  "type": "[domain].[action] (e.g., notification.send_email)",
  "version": "1.0",
  "timestamp": "ISO-8601",
  "producer": "[service-name]",
  "correlationId": "uuid-v4 (traces business transaction)",
  "priority": "[0-9, where 9 is highest]",
  "payload": {
    // Message-specific data
  }
}

Message types and payloads:

Message TypePayload FieldsSize (avg)Priority
[e.g., notification.send_email][to, subject, template, variables][~2 KB][5]
[e.g., order.process_payment][orderId, amount, paymentMethod][~1 KB][9]
[e.g., report.generate][reportType, dateRange, userId][~500 B][3]

Consumer Design

ConsumerQueueConcurrencyProcessing Time (P50 / P99)Idempotent?Scaling Strategy
[Service name][Queue name][e.g., 5 workers][e.g., 50ms / 500ms]Yes / No[Horizontal / Vertical]
[Service name][Queue name][e.g., 10 workers][e.g., 200ms / 2s]Yes / No[Horizontal / Vertical]

Acknowledgment strategy:

  • [When does the consumer ack? After processing? After writing to database?]
  • [What happens on consumer crash before ack? (Message redelivered)]
  • [Visibility timeout / ack deadline: e.g., 30 seconds]

Consumer group management:

  • [How are consumers grouped?]
  • [How is work distributed across consumers? (Round-robin, consistent hashing, partition assignment)]
  • [How do you add/remove consumers without message loss?]

Delivery Guarantees

QueueGuaranteeIdempotency Strategy
[Queue name]At-least-once[e.g., Dedup by messageId in Redis with 24h TTL]
[Queue name]At-most-once[N/A, acceptable for analytics]
[Queue name]At-least-once[e.g., Database upsert with messageId as unique constraint]

Backpressure and Rate Limiting

  • Queue depth threshold: [e.g., Alert at 10,000 messages, scale consumers at 50,000]
  • Producer rate limiting: [e.g., Max 1,000 messages/sec per producer, reject with 429 if exceeded]
  • Consumer rate limiting: [e.g., Max 100 external API calls/sec, use token bucket]
  • Auto-scaling rules: [e.g., Add 1 consumer per 5,000 queue depth, max 20 consumers]
  • Circuit breaker: [e.g., If downstream service returns 5 consecutive 5xx, pause consumption for 30s]

Retry Strategy

Retry LevelDelayMax AttemptsOn Failure
Immediate retry01Move to delayed retry
Delayed retry (level 1)5 seconds1Move to level 2
Delayed retry (level 2)30 seconds1Move to level 3
Delayed retry (level 3)5 minutes1Move to DLQ

Retry classification:

  • Retryable errors: [Timeouts, 5xx, connection refused, resource locked]
  • Non-retryable errors: [Validation failure, 4xx, malformed message, business rule violation]
  • Non-retryable errors skip directly to DLQ

Dead Letter Queue (DLQ)

Source QueueDLQ NameRetentionAlertInvestigation SLA
[Queue name][e.g., notifications.email.dlq][14 days][PagerDuty / Slack][1 hour / 4 hours]
[Queue name][DLQ name][retention][alert channel][SLA]

DLQ processing workflow:

  1. Alert fires when a message enters any DLQ
  2. On-call engineer inspects the message payload and error reason
  3. If the root cause is fixed, replay the message from DLQ to the source queue
  4. If the message is permanently invalid, archive it with a reason and close the alert

DLQ replay tool: [e.g., CLI tool, admin dashboard, automated replay after deployment]


Monitoring and Alerting

MetricDescriptionWarningCritical
Queue depthMessages waiting to be consumed> [X]> [Y]
Consumer lagRate of production minus rate of consumptionGrowing for > 5 minGrowing for > 15 min
Processing latency (P99)Time from enqueue to consumer ack> [X]ms> [Y]ms
Error rateFailed processing / total processed> 1%> 5%
DLQ countMessages in dead letter queues> 0> 10
Consumer healthHeartbeat / liveness check1 consumer down> 50% consumers down

Dashboard: [Link to Grafana/Datadog/CloudWatch dashboard]

On-call runbook: [Link to runbook for queue-related incidents]


Capacity Planning

MetricCurrent6-Month Target12-Month Target
Messages/second (peak)[number][number][number]
Average message size[bytes][bytes][bytes]
Storage required[GB][GB][GB]
Consumer instances[count][count][count]
End-to-end latency (P99)[ms][ms][ms]

Security

  • Encryption in transit: [e.g., TLS 1.3 for all connections]
  • Encryption at rest: [e.g., AES-256 for stored messages]
  • Access control: [e.g., IAM roles per service, no shared credentials]
  • PII handling: [e.g., No PII in message payloads, use reference IDs]
  • Audit logging: [e.g., Log all queue management operations (create, delete, purge)]

Filled Example: Async Notification Pipeline

System Overview

FieldDetails
System NameNotification Pipeline
AuthorJordan Rivera, Backend Engineer
DateMarch 2026
StatusApproved
Queue TechnologyAWS SQS (Standard Queues) + SNS (for fan-out)
Related DocsADR-044 (SQS over RabbitMQ), PRD-2026-031 (Multi-channel Notifications)

Business Context

NotifyApp sends 2 million notifications per day across email, push, SMS, and in-app channels. The current system processes notifications synchronously during API request handling. When the email provider is slow (P99: 3 seconds), it blocks the API response, causing timeouts for end users. During marketing campaigns, email volume spikes 10x, overwhelming the email service and causing cascading failures in the core API.

The queue system decouples notification sending from the triggering event. The API enqueues a notification request in under 10ms and returns immediately. Dedicated consumers for each channel (email, push, SMS, in-app) process notifications at their own pace, independently scaling to match demand.

Queue Topology

Queue NameTypeProducersConsumersMessages/sec (peak)Retention
notifications.emailPoint-to-pointNotification RouterEmail Worker5007 days
notifications.pushPoint-to-pointNotification RouterPush Worker3003 days
notifications.smsPoint-to-pointNotification RouterSMS Worker507 days
notifications.in_appPoint-to-pointNotification RouterIn-App Worker2003 days

A single SNS topic (notifications.dispatch) fans out to all four SQS queues. The Notification Router publishes once, and each channel queue receives a copy filtered by message attributes.

Consumer Design

ConsumerQueueConcurrencyProcessing Time (P50 / P99)Idempotent?Scaling Strategy
Email Workernotifications.email20 workers200ms / 2sYes (dedup by notificationId + channel)Horizontal, auto-scale on queue depth
Push Workernotifications.push10 workers50ms / 300msYes (dedup by notificationId + channel)Horizontal
SMS Workernotifications.sms5 workers100ms / 1sYes (dedup by notificationId + channel)Fixed (SMS rate-limited by provider)
In-App Workernotifications.in_app10 workers20ms / 100msYes (database upsert)Horizontal

Common Mistakes to Avoid

  • Using a queue when a synchronous call would be simpler. If you need a response and the downstream service is fast and reliable, a direct HTTP call is easier to reason about. Queues add operational complexity. Only use them when you need decoupling, buffering, or async processing.
  • No visibility timeout tuning. If your consumer takes 30 seconds to process a message but the visibility timeout is 15 seconds, the message becomes visible again and gets processed twice. Set the visibility timeout to at least 2x your P99 processing time.
  • Ignoring poison messages. A malformed message that fails every retry will block the queue if you do not have a max retry count and DLQ. One bad message should not block thousands of good ones.
  • Treating all messages as equal priority. A password reset email and a marketing campaign email should not compete for the same consumer capacity. Use separate queues or priority levels for different urgency tiers.
  • No backpressure strategy. Without backpressure, a producer can flood the queue faster than consumers can drain it. Eventually the queue hits storage limits, and messages are rejected or dropped. Define queue depth alerts and auto-scaling rules before you need them.

Key Takeaways

  • Map every producer, consumer, queue, and message type before writing code
  • Set delivery guarantees per queue based on business criticality, not uniformly
  • Design consumers to be idempotent. Duplicate delivery will happen
  • Implement dead letter queues and monitoring before going to production
  • Plan for backpressure with queue depth alerts, auto-scaling rules, and circuit breakers

About This Template

Created by: Tim Adair

Last Updated: 3/5/2026

Version: 1.0.0

License: Free for personal and commercial use

Frequently Asked Questions

How do I choose between RabbitMQ, SQS, and Kafka for message queuing?+
Use SQS when you want a managed service with minimal operational overhead and do not need strict ordering or replay. Use RabbitMQ when you need flexible routing (topic exchanges, headers-based routing) and are comfortable managing infrastructure. Use Kafka when you need high throughput, message replay, and long retention. Kafka is an event log, not a traditional queue. It stores messages permanently and consumers track their own offsets. For most product teams, SQS is the right starting point. Record the technology decision in an [architecture decision record](/templates/architecture-decision-record-template).
How do I ensure messages are not processed twice?+
Design consumers to be idempotent. The simplest approach: store each `messageId` in a deduplication table (or Redis set with TTL) and skip messages you have already seen. For database writes, use upsert operations with the `messageId` as a unique constraint. The goal is not to prevent duplicate delivery (most brokers cannot guarantee that) but to make duplicate processing harmless.
What queue depth should trigger an alert?+
It depends on your consumer throughput and latency SLA. Calculate: if each consumer processes 100 messages/second and you have 5 consumers, your drain rate is 500 messages/second. If your [SLA](/glossary/service-level-agreement-sla) requires messages to be processed within 60 seconds, your alert threshold is 500 * 60 = 30,000 messages. Set a warning at 50% of that (15,000) and critical at 80% (24,000).
Should I use FIFO or standard queues?+
Use FIFO queues only when message order matters for correctness (e.g., processing balance updates for the same account). FIFO queues have lower throughput limits (300-3,000 messages/sec on SQS vs effectively unlimited for standard queues). Most use cases do not require strict ordering. If only a subset of messages needs ordering, use partition keys (message group IDs) to order within a group while allowing parallelism across groups. ---

Explore More Templates

Browse our full library of PM templates, or generate a custom version with AI.

Free PDF

Like This Template?

Subscribe to get new templates, frameworks, and PM strategies delivered to your inbox.

or use email

Join 10,000+ product leaders. Instant PDF download.

Want full SaaS idea playbooks with market research?

Explore Ideas Pro →