Skip to main content
New: 9 PM Courses with hands-on exercises and certificates
Back to Glossary
EngineeringR

Rate Limiting

Definition

Rate limiting is a traffic control mechanism that restricts the number of requests a client (user, API key, IP address) can make to a service within a defined time window. When a client exceeds the limit, the server rejects additional requests with an HTTP 429 ("Too Many Requests") status code until the window resets. The primary purposes are protecting infrastructure from overload, preventing abuse (intentional or accidental), and ensuring fair resource allocation across clients.

Rate limiting algorithms vary in sophistication. Fixed window counting is the simplest: allow N requests per minute, reset the counter at the top of each minute. Sliding window is smoother: track requests over a rolling time period. Token bucket allows bursts: the client accumulates "tokens" at a fixed rate and spends them on requests, allowing short bursts above the average rate. Leaky bucket enforces a steady output rate regardless of input burstiness. Most API gateway products (Kong, AWS API Gateway, Cloudflare) implement multiple algorithms and let you choose.

Rate limiting is distinct from request authentication (who is the caller?) and authorization (what are they allowed to do?). It sits alongside those controls as a third layer of API protection. Even authenticated, authorized clients can be rate-limited to prevent one customer from consuming disproportionate resources and degrading service for everyone else.

Why It Matters for Product Managers

Rate limits are a product decision, not just an engineering decision. They define what customers can do with your product and directly shape the pricing model. For API-first products (Stripe, Twilio, OpenAI), rate limits are a primary packaging lever. Free tier gets 60 requests per minute. Pro gets 600. Enterprise gets 6,000 or custom limits. The rate limit tells the customer: "this tier supports workloads of this scale."

Rate limits also protect the user experience for all customers. Without rate limiting, one misbehaving client (or a bot scraping your API) can consume enough resources to degrade performance for everyone. This is especially important for multi-tenant SaaS products where all customers share infrastructure. PMs should ensure rate limits are documented in the product, surfaced in the API response headers, and clearly communicated in pricing pages. Surprises around rate limiting cause customer churn.

How to Apply It

When designing rate limits, work backward from customer use cases. What does a legitimate free-tier integration look like? How many requests per minute does it need? What about a production integration on the paid tier? Set limits that accommodate legitimate usage patterns with headroom, and make the limits visible to developers through response headers and dashboard usage meters. Pair rate limiting with clear upgrade paths: when a customer hits the free-tier limit, show them exactly what the next tier offers. Review how rate limits interact with your pricing strategy and ensure the tiers align with the value different customer segments extract. Use the RICE framework to prioritize rate-limiting improvements against other API platform work.

Frequently Asked Questions

How do you choose the right rate limit for an API?+
Start by analyzing actual usage patterns. Look at the p95 and p99 request rates from your current clients. Set the rate limit above the p99 of legitimate usage but well below the threshold where your infrastructure degrades. For a typical SaaS API, common starting points are 100 requests per minute for free tier, 1000 per minute for paid tier, and 10,000 per minute for enterprise. Stripe uses 100 reads per second and 100 writes per second as defaults. Always expose rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) so clients can self-throttle.
What is the difference between rate limiting and throttling?+
Rate limiting is a hard cap: once a client exceeds the limit, requests are rejected with a 429 status code until the window resets. Throttling is a soft degradation: requests are slowed down (queued or delayed) rather than rejected. Some systems use both. Rate limiting prevents abuse and protects infrastructure. Throttling provides a smoother experience by degrading gracefully under load. In practice, many engineers use the terms interchangeably, but the distinction matters when designing API behavior for your clients.
How does rate limiting affect product pricing and packaging?+
Rate limits are one of the primary levers for differentiating API pricing tiers. Twilio, Stripe, SendGrid, and most API-first companies use rate limits to segment free, starter, pro, and enterprise plans. Higher rate limits justify higher prices because they enable higher-volume use cases. PMs should design rate limits that align with the value each tier delivers. A free tier with 10 requests per minute lets developers test the integration. A paid tier with 1000 per minute supports a production workload. Enterprise tiers often have custom limits negotiated during the sales process.

Explore More PM Terms

Browse our complete glossary of 100+ product management terms.