What This Template Is For
An API orchestration layer sits between your clients and your backend services. Instead of forcing a mobile app to make six separate API calls to render a single screen, an orchestration layer combines those calls into one request, handles partial failures gracefully, and returns a unified response. The pattern goes by several names: Backend for Frontend (BFF), API gateway, or API composition layer.
Without a clear specification, orchestration layers become dumping grounds for business logic that belongs elsewhere. Timeouts pile up, error handling is inconsistent, and nobody knows which downstream service is responsible for which field in the response.
This template documents the orchestration design: which endpoints exist, which services they call, how responses are assembled, and what happens when a downstream service fails. Use it when building a new BFF, refactoring a monolith API into composed microservice calls, or adding a new aggregate endpoint to an existing gateway. For broader context on managing technical architecture decisions, see the Technical PM Handbook. If your team is evaluating API management tools, the PM Tool Picker can help compare options.
How to Use This Template
- List all client screens or workflows that require data from more than one backend service. These are your orchestration candidates.
- For each candidate, map the downstream service calls: which services, which endpoints, which fields from each response.
- Define the response contract. The orchestration layer should return a stable schema even when downstream services change their internal APIs.
- Document error handling for each downstream call. Decide which failures are fatal (block the whole response) and which are degradable (return partial data).
- Set timeout and retry policies per downstream call, not globally.
- Review with engineering to validate assumptions about service latency and availability, then implement endpoint by endpoint.
The Template
Orchestration Layer Overview
| Field | Details |
|---|---|
| Layer Name | [e.g., Checkout BFF, Mobile API Gateway] |
| Owner | [Team or individual] |
| Base URL | [e.g., https://api.example.com/v1/bff] |
| Transport | [REST / GraphQL / gRPC] |
| Authentication | [JWT / API key / OAuth2 / Session] |
| Date | [Date] |
Downstream Service Registry
| Service | Base URL | Auth Method | Avg Latency | SLA | Owner |
|---|---|---|---|---|---|
| [Service A] | [URL] | [Bearer token / mTLS] | [ms] | [99.9%] | [Team] |
| [Service B] | [URL] | [API key] | [ms] | [99.5%] | [Team] |
| [Service C] | [URL] | [mTLS] | [ms] | [99.9%] | [Team] |
Orchestrated Endpoints
Endpoint 1: [Name]
| Setting | Value |
|---|---|
| Path | [e.g., GET /bff/dashboard] |
| Purpose | [What client screen or workflow this serves] |
| Downstream Calls | [List of service calls in execution order] |
| Parallel vs Sequential | [Which calls can run in parallel, which depend on prior results] |
| Timeout (total) | [e.g., 3000ms] |
| Cache TTL | [e.g., 60s / No cache] |
Call sequence.
| Order | Service Call | Required? | Timeout | Fallback |
|---|---|---|---|---|
| 1 (parallel) | [Service A: GET /users/{id}] | Yes | [500ms] | [Fail entire request] |
| 1 (parallel) | [Service B: GET /preferences/{id}] | No | [300ms] | [Return defaults] |
| 2 (sequential) | [Service C: GET /recommendations?user={id}] | No | [1000ms] | [Return empty array] |
Response schema.
{
"user": { "id": "", "name": "", "email": "" },
"preferences": { "theme": "", "notifications": true },
"recommendations": []
}
Error Handling Strategy
| Scenario | Response Code | Behavior |
|---|---|---|
| All downstream calls succeed | 200 | Return assembled response |
| Required service fails | 502 | Return error with failed service identifier |
| Optional service fails | 200 | Return partial response with null/default for failed section |
| All downstream calls timeout | 504 | Return gateway timeout |
| Authentication failure on downstream | 401 | Propagate to client |
| Rate limited by downstream | 429 | Return 429 with Retry-After header |
Error response format.
{
"error": {
"code": "DOWNSTREAM_FAILURE",
"message": "One or more services unavailable",
"failedServices": ["service-b"],
"partialData": true
},
"data": { }
}
Resilience Patterns
Circuit Breaker Configuration
| Service | Failure Threshold | Open Duration | Half-Open Probes |
|---|---|---|---|
| [Service A] | [5 failures in 60s] | [30s] | [3 requests] |
| [Service B] | [3 failures in 30s] | [60s] | [2 requests] |
Retry Policy
| Service | Max Retries | Backoff | Retry On |
|---|---|---|---|
| [Service A] | [2] | [Exponential: 100ms, 200ms] | [5xx, timeout] |
| [Service B] | [1] | [Fixed: 100ms] | [5xx only] |
- ☐ Circuit breakers configured for each downstream service
- ☐ Retry policies defined per service (not global)
- ☐ Bulkhead isolation prevents one slow service from consuming all connection pool threads
- ☐ Fallback responses defined for non-critical services
- ☐ Health check endpoint exposes downstream service status
Rate Limiting
| Limit Type | Value | Scope | Response |
|---|---|---|---|
| Per-user | [100 req/min] | [User ID from JWT] | 429 + Retry-After |
| Per-IP | [500 req/min] | [Client IP] | 429 + Retry-After |
| Per-endpoint | [50 req/min] | [Path + User] | 429 + Retry-After |
| Global | [10,000 req/min] | [Entire gateway] | 503 |
Caching Strategy
| Endpoint | Cache Layer | TTL | Invalidation |
|---|---|---|---|
| [GET /bff/dashboard] | [Redis / CDN] | [60s] | [User update webhook] |
| [GET /bff/catalog] | [CDN] | [300s] | [Catalog publish event] |
| [POST endpoints] | No cache | N/A | N/A |
Monitoring and Observability
- ☐ Request duration tracked per orchestrated endpoint (P50, P95, P99)
- ☐ Downstream call duration tracked per service (separate from total)
- ☐ Error rate dashboards per downstream service
- ☐ Circuit breaker state changes trigger alerts
- ☐ Distributed tracing (trace ID propagated to all downstream calls)
- ☐ Log correlation: single request ID in gateway logs and all downstream logs
| Metric | Alert Threshold | Channel |
|---|---|---|
| Total endpoint latency P99 | > [target]ms | [Slack / PagerDuty] |
| Downstream error rate | > [5%] over 5 min | [PagerDuty] |
| Circuit breaker open | Any service | [Slack] |
Filled Example: E-Commerce Checkout BFF
Orchestration Layer Overview
| Field | Details |
|---|---|
| Layer Name | Checkout BFF |
| Owner | Commerce Platform Team |
| Base URL | https://api.acmestore.com/v1/checkout |
| Transport | REST (JSON) |
| Authentication | JWT (issued by Auth Service) |
Downstream Services
| Service | Base URL | Avg Latency | SLA |
|---|---|---|---|
| Cart Service | cart.internal:8080 | 45ms | 99.95% |
| Inventory Service | inventory.internal:8080 | 80ms | 99.9% |
| Pricing Service | pricing.internal:8080 | 35ms | 99.95% |
| Payment Service | payments.internal:8080 | 250ms | 99.99% |
| Shipping Service | shipping.internal:8080 | 120ms | 99.5% |
| Tax Service | tax.internal:8080 | 60ms | 99.9% |
Key Endpoint: POST /checkout/summary
Call sequence (total timeout: 2000ms).
| Order | Service Call | Required? | Timeout | Fallback |
|---|---|---|---|---|
| 1 | Cart: GET /carts/{id} | Yes | 200ms | Fail request |
| 2 (parallel) | Inventory: POST /check-availability | Yes | 400ms | Fail request |
| 2 (parallel) | Pricing: POST /calculate | Yes | 200ms | Fail request |
| 2 (parallel) | Shipping: POST /estimate | No | 500ms | "Shipping calculated at next step" |
| 3 | Tax: POST /calculate | Yes | 300ms | Fail request |
Response (assembled).
{
"cart": { "items": [], "itemCount": 3 },
"pricing": { "subtotal": 14997, "currency": "USD" },
"shipping": { "options": [], "estimated": true },
"tax": { "amount": 1237, "rate": 0.0825 },
"total": 16234,
"availabilityConfirmed": true
}
If Shipping Service is unavailable, the response still returns with shipping: null and a partialData: true flag. The client displays "Shipping cost calculated at payment" instead of blocking the entire checkout summary.
Key Takeaways
- Define which downstream failures are fatal and which are degradable before writing any code
- Set per-service timeouts and retries rather than global defaults
- Keep business logic in downstream services, not in the orchestration layer
- Propagate trace IDs through all downstream calls for debugging
- Cache aggressively for read-heavy endpoints and invalidate via events
About This Template
Created by: Tim Adair
Last Updated: 3/5/2026
Version: 1.0.0
License: Free for personal and commercial use
