Rate Limiter
- 文章發表於
Introduction
Rate Limiter plays an indispensable role in today's software architecture. It acts as a protective barrier, preventing software services from being abused or attacked, ensuring system stability, and guaranteeing fair usage rights for all users.
Imagine a software service provider with traffic exceeding hundreds of millions, such as ChatGPT or x.com. Without the protection of a rate limiter, the system could be excessively abused by a single user or malicious program, monopolizing hardware resources or even causing a crash, thereby affecting the experience of other users. Using a rate limiter ensures that each user can fairly use resources, maintaining the overall system's performance stability and avoiding any single user monopolizing resources or suffering from malicious attacks.
The main benefits of using a rate limiter include:
- Prevent Abuse: Malicious users may attempt to disrupt the system's normal operation through a large number of requests (e.g., DoS attacks) or API abuse (e.g., artificial flooding, excessive data scraping).
- Resource Management: Ensures that backend services are not overloaded by a large number of requests, thereby maintaining system stability and availability, allowing all users to have a good experience.
- Fair Usage: Ensures that all users can fairly access resources, preventing a few high-traffic users from monopolizing resources and affecting the service quality of other users.
- Cost Control: In a cloud environment, excessive requests can lead to high operational costs; through rate limiting, these costs can be effectively controlled and managed.
Overview

After using a rate limiter, the overall user flow is as follows:
- Client Sends Request
The user (via browser or mobile app) sends an HTTP request. - Load Balancer Receives Request
The load balancer performs health checks and distributes the request to available API Gateway nodes. - API Gateway Forwards to Rate Limiter
Before any backend processing, the API Gateway forwards the request to the rate limiter. - Rate Limiter Checks Usage Data
The rate limiter queries the cache system (e.g., Redis or Memcached) to obtain the current request count for the user. If there is no data in the cache, it may query the database to get rate limiting rules and update the cache. - Decision: Allow or Deny
- Allow – The rate limiter notifies the API Gateway to continue processing the request.
- Deny – The rate limiter notifies the API Gateway to immediately return an HTTP 429 Too Many Requests error, optionally attaching a
Retry-After
header to inform the client when they can try again.
- Forward to Backend API (if allowed)
The API Gateway forwards the request to the backend API server to execute business logic, query the database, or call other services. - Return Response to Client
The API response is returned to the client via the API Gateway and load balancer.
This article will focus on how to implement common algorithms for rate limiters and implement a simple rate limiter.
Implementation
As mentioned in previous discussions about rate limiters, the core purpose of designing a rate limiter is to prevent a single user from abusing resources, cope with malicious attacks, and simultaneously manage costs. However, how can these concepts be practically implemented in code?
Before diving into the implementation, let's think about: What elements do we need, and what information should we return?
For example, readers who have used Claude or ChatGPT may have experienced rate limiting. When your usage exceeds the limit, you might see a message like "You have exceeded the usage limit, please try again after a certain time," or when you're close to exceeding, "You have X requests left, next update at Y time."
To achieve the above effect, the backend needs to use HTTP status codes (e.g., 429 Too Many Requests) along with Headers (such as X-RateLimit-Limit
, X-RateLimit-Remaining
, X-RateLimit-Reset
, or Retry-After
) to inform the client whether the user is being rate-limited and the remaining quota. The client can then correctly display the corresponding message to the user by detecting the status code or reading the relevant Headers.
An ideal rate limiting response should include the following fields:
{success: boolean, // Whether the request is allowed to proceedlimit: number, // The maximum number of requests allowed in the current time windowremaining: number, // The remaining number of available requestsreset: number, // The time point when the current time window resets (Unix timestamp)pending: Promise // Background processing tasks (e.g., data synchronization or cache cleanup)}
This way, the client can determine whether to display a prompt message or automatically schedule a retry based on this unified data structure.
Basic Structure
The basic functionality of a rate limiter typically includes: setting a request limit within a specific period, choosing a rate limiting algorithm, and customizing storage space. We will use the Strategy Pattern to allow developers using the package to specify the algorithm they want to use.
We will introduce the following common rate limiting algorithms:
- Fixed Window
- Sliding Window Log
- Sliding Window Count
- Token Bucket
Implementation Framework
/*** The main class for applying rate limiting.* Uses the Strategy Pattern, allowing different rate limiting algorithms to be passed in at runtime.*/class RateLimit {/*** @param {object} config - The configuration object for the rate limiter.* @param {function} config.limiter - A specific rate limiting algorithm function (generated by one of the static methods).* @param {RedisMap} [config.storage] - (Optional) An instance for storing state, defaults to a new RedisMap.* @param {number} [config.delay] - (Optional) The number of milliseconds to simulate network delay.*/constructor(config) {this.limiter = config.limiter;this.storage = config.storage || new RedisMap();this.delay = config.delay || 10; // Default simulated delay}/*** Applies the configured rate limit to the given identifier.* @param {string} identifier - A unique identifier to distinguish the entity being limited (e.g., user ID, IP address).* @returns {Promise<object>} A Promise that resolves to an object containing success/failure and limit details.* @example* const result = await rateLimiter.limit("user-123");* // result: { success: boolean, limit: number, remaining: number, reset: number }*/async limit(identifier) {return await this.limiter(this.storage, identifier, this.delay);}/*** Limits requests within fixed, non-overlapping time windows.* @param {number} tokens - The maximum number of requests allowed per window.* @param {string} window - The duration of the time window (e.g., "60s", "1m").* @returns {function} A rate limiter function that can be passed into the RateLimit constructor.*/static fixedWindow(tokens, window) {const windowDuration = parseWindow(window);return async (storage, identifier, delay) => {// Algorithm implementation...};}/*** Approximates a sliding window by weighted calculation of the current and previous fixed windows.* @param {number} tokens - The maximum number of requests allowed per window.* @param {string} window - The duration of the time window (e.g., "60s", "1m").* @returns {function} A rate limiter function.*/static slidingWindow(tokens, window) {const windowDuration = parseWindow(window);return async (storage, identifier, delay) => {// Algorithm implementation...};}/*** Allows burst requests up to the bucket's capacity and refills tokens at a fixed rate.* @param {number} refillRate - The number of tokens to refill per interval.* @param {string} interval - The interval for refilling tokens (e.g., "1s", "10s").* @param {number} maxTokens - The maximum number of tokens the bucket can hold.* @returns {function} A rate limiter function.*/static tokenBucket(refillRate, interval, maxTokens) {return async (storage, identifier, delay) => {// Algorithm implementation...};}/*** Maintains a log of request timestamps within a sliding window.* @param {number} tokens - The maximum number of requests allowed per window.* @param {string} window - The duration of the time window (e.g., "60s", "1m").* @returns {function} A rate limiter function.*/static slidingWindowLog(tokens, window) {return async (storage, identifier, delay) => {// Algorithm implementation...};}}
Fixed Window
The Fixed Window, as the name suggests, divides time into fixed-length "time windows" (e.g., every minute, every hour) and tracks and accumulates the number of requests received within each window. When a time window ends and a new one begins, the counter for the previous window is automatically reset to zero.
The overall process is as follows:
- We receive two parameters:
tokens
: The maximum number of requests allowed in this time window.window
: The length of the window (in milliseconds), e.g., 1m (representing one minute).
- Calculate the bucket to which the current request belongs.
- Combine the
identifier
(e.g., user ID, API key) with the bucket to form a unique key, and read the currently recorded request count from the storage (Redis/Memcached). - Determine if the quota is exceeded:
- If the currently accumulated number of requests ≥ tokens, the quota is full, and the request must be denied.
- If the currently accumulated number of requests < tokens, more requests can be allowed, so increment the bucket's counter and return a success message.
- Automatic Reset
When entering a new bucket, the old bucket number is naturally phased out, and its corresponding counter is no longer used; the new bucket's counter starts accumulating from zero.
Most of the above involves fetching values to determine whether to deny or accept the request, followed by updating the values. The most challenging part is: How to efficiently track "the number of requests within a specific time window" without storing the timestamp of each request?
const bucket = Math.floor(Date.now() / windowDuration);
This way, we don't need to store the timestamp of each request but can determine the bucket to which the current request belongs by calculating the current time and the window length.
At the same time, the benefits of this approach are:
- Global Consistency: All requests within the same time interval fall into the same bucket.
- Automatic Expiration: When the bucket number advances, past buckets are naturally no longer used.
O(1)
Operation: There's no need to scan the entire timestamp array; simply calculate the bucket number directly.
class RateLimit {...static fixedWindow(tokens, window) {const windowDuration = parseWindow(window);return async (storage, identifier, delay) => {// 1. Calculate the bucket to which the current request belongsconst currentBucket = Math.floor(now / windowDuration);const key = `${identifier}:${currentBucket}`// 2. Read the tokens already used in this bucket (if not exists, treat as 0)const previousTokens = storage.get(key) || 0;const reset = (currentBucket + 1) * windowDuration;// 3. If already full, denyif (previousTokens > tokens) {return { success: false, remaining: 0, limit: tokens, reset }}// 4. Not exceeded yet, increment and update storageconst currentTokens = previousTokens + 1;// 5. If this is the first write to this bucket, set expiration timeif (currentTokens === 1) {storage.pexpire(key, Math.floor(windowDuration * 1.5))}storage.set(key, currentTokens)return { success: true, limit: tokens, remaining: tokens - currentTokens, reset }};}...}
Advantages:
- Simple Implementation: Clear logic, easy to understand and implement.
- Memory Efficient: Only needs to maintain a counter for each limited entity (e.g., user ID or IP address) within the current time window.
- Low Computational Overhead: Counting and reset operations are usually fast.
Disadvantages:
- Window Boundary Effect / Critical Section Problem:
This is the main flaw of this algorithm. At the boundary of the time window, the actual rate of traffic may exceed the set average rate. For example, if the limit is 100 requests per minute, a user could send 100 requests at 0:00:59 (allowed), followed by another 100 requests at 0:01:00 (new window begins, also allowed). This results in the system processing 200 requests in just a few seconds, potentially impacting backend services. - Uneven Traffic: Since the counter resets at the start of each window, traffic may surge at the beginning of the window.
Sliding Window Log
The Sliding Window Log aims to address the shortcomings of the Fixed Window by allowing smooth transitions at window boundaries, avoiding sudden traffic surges at the start of a window. It records the timestamp of each request arrival. When a new request arrives, the system checks all request timestamps within the past "sliding time window" (e.g., the last 60 seconds). If the number of these timestamps exceeds the preset threshold, the new request is denied.
class RateLimit {...static slidingWindow(tokens, window) {const windowDuration = parseWindow(window);return async (storage, identifier, delay) => {const now = Date.now();const key = `log:${identifier}`;// 1. Retrieve the stored timestamp array, or use an empty array if not existslet timestamps = storage.get(key) || [];// 2. Filter out timestamps that are "outside the window" (<= now - windowDuration)const windowStartTime = now - windowDuration;timestamps = timestamps.filter(ts => ts > windowStartTime);// 3. If there are already tokens requests in this window, deny// earliest = timestamps[0]if (timestamps.length >= tokens) {// Calculate the next available time: earliestTimestamp + windowDurationconst resetTime = timestamps[0] + windowDuration;// Save the "filtered array" back and set TTLstorage.set(key, timestamps);// Add an extra 1000 milliseconds (1 second) to ensure the key does not expire prematurely due to extreme conditions (e.g., network delay, program execution time), ensuring all requests within the window are correctly counted.storage.pexpire(key, windowDuration + 1000);return {success: false,limit: tokens,remaining: 0,reset: resetTime};}// 4. Not yet at the limit, push now into the array and save backtimestamps.push(now);storage.set(key, timestamps);storage.pexpire(key, windowDuration + 1000);// 5. Calculate remaining quota and reset timeconst remaining = tokens - timestamps.length;// Recalculate earliestTimestamp + windowDuration as resetconst reset = timestamps[0] + windowDuration;// 6. FOR DEV PURPOSE, simulate background work with delayif (delay) {await new Promise(resolve => setTimeout(resolve, delay));}// 7. Return success messagereturn {success: true,limit: tokens,remaining,reset};};}...}
Advantages:
- Precise Traffic Control: Can very precisely limit the number of requests within any given time period, effectively avoiding the boundary effect of fixed windows.
- Smooth Traffic: Since the window is continuously sliding, traffic control is smoother.
Disadvantages:
- High Memory Consumption: Needs to store the timestamp of each request, which can be very memory-intensive when the number of requests is large.
- High Computational Cost: Each request requires cleaning and counting operations on the timestamp list, which can become a performance bottleneck under high concurrency.
Sliding Window Count
The Sliding Window Count aims to combine the low memory consumption of the Fixed Window Counter with the smooth traffic control characteristics of the Sliding Window Log. It maintains request counts for the current time window and the previous time window and calculates an approximate total number of requests within a sliding window based on the current time's position within the window.
The overall process is as follows:
- Time Slicing and Counters: Similar to the Fixed Window, time is divided into fixed-length windows, and a counter is maintained for each window.
- Approximate Sliding Count: When a new request arrives, it considers not only the current window's request count (currentWindowCount) but also the previous window's request count (previousWindowCount).
- Approximate Total = (previousWindowCount * overlap ratio) + currentWindowCount
- Here, the "overlap ratio" refers to how much of the previous window still falls within the current "sliding window." For example, if the window size is 60 seconds and the current time is 1:00:15 (i.e., 15 seconds into the current window), then 45 seconds (75%) of the previous window (0:59:00-1:00:00) still falls within the past 60-second sliding window.
- Limit Check: If this approximate total is less than the threshold, the request is allowed, and the current window's counter is incremented; otherwise, it is denied.
Example Explanation (assuming a limit of 100 requests per 60 seconds, window length of 60 seconds):
- Window 1 (0:00:00 - 0:01:00): Received 80 requests,
count_window1 = 80
- Window 2 (0:01:00 - 0:02:00):
- At 0:01:15 (15 seconds into the current window, 25%):
- Previous window's contribution ratio = (60 - 15) / 60 = 45 / 60 = 75%
- Assume Window 2 already has 10 requests (
count_window2 = 10
) - Approximate sliding total = (count_window1 × 75%) + count_window2 = (80 × 0.75) + 10 = 60 + 10 = 70
- 70 < 100, Request Allowed
- At 0:01:45 (45 seconds into the current window, 75%):
- Previous window's contribution ratio = (60 - 45) / 60 = 15 / 60 = 25%
- Assume Window 2 already has 50 requests (
count_window2 = 50
) - Approximate sliding total = (count_window1 × 25%) + count_window2 = (80 × 0.25) + 50 = 20 + 50 = 70
- 70 < 100, Request Allowed
- At 0:01:15 (15 seconds into the current window, 25%):
class RateLimit {...static slidingWindow(tokens, window) {const windowDuration = parseWindow(window);return async (storage, identifier, delay) => {const now = Date.now();// 1. Calculate "current" and "previous" bucketsconst currentBucket = Math.floor(now / windowDuration);const previousBucket = currentBucket - 1;const currentKey = `${identifier}:${currentBucket}`;const previousKey = `${identifier}:${previousBucket}`;// 2. Get the token count for each bucket (default to 0)const currentTokens = storage.get(currentKey) || 0;const previousTokens = storage.get(previousKey) || 0;// 3. Calculate the weighted token count for the previous bucketconst elapsed = now % windowDuration;const overlapRatio = 1 - (elapsed / windowDuration);const weightedPrevious = Math.floor(previousTokens * overlapRatio);// 4. Calculate the total token count within the sliding windowconst totalTokens = currentTokens + weightedPrevious;const reset = (currentBucket + 1) * windowDuration;// 5. Check if the quota is exceededif (totalTokens >= tokens) {return {success: false,limit: tokens,remaining: 0,reset};}// 6. Not exceeded yet, increment the current bucket's token count and update storageconst newCurrentTokens = currentTokens + 1;storage.set(currentKey, newCurrentTokens);// 7. If this is the first write, set TTLif (newCurrentTokens === 1) {// TTL set to windowDuration * 2 + 1000 to ensure the previous bucket has not expiredstorage.pexpire(currentKey, Math.floor(windowDuration * 2 + 1000));}// 8. Calculate the new remaining countconst newTotalTokens = newCurrentTokens + weightedPrevious;const remaining = Math.max(0, tokens - newTotalTokens);// 9. FOR DEV PURPOSE, simulate background delayif (delay) {await new Promise(resolve => setTimeout(resolve, delay));}// 10. Return success messagereturn {success: true,limit: tokens,remaining,reset};};}}
Advantages:
- Smooth Traffic: Compared to the Fixed Window Counter, the Sliding Window Count effectively reduces sudden traffic surges at window boundaries, making rate limiting smoother.
- Memory Efficient: Only needs to store counts for the current and previous windows, requiring much less memory than the Sliding Window Log.
- Moderate Implementation Difficulty: Simpler than the Sliding Window Log, only slightly more complex than the Fixed Window Counter.
Disadvantages:
- Only an Approximation: Cannot be as precise as the Sliding Window Log; there may be errors under extreme traffic distributions.
- Requires Precise Overlap Ratio Calculation: Must correctly calculate the window overlap ratio and properly maintain counts for two windows.
Token Bucket
The Token Bucket concept imagines each request as taking a token from a "token bucket." The system adds tokens to this bucket at a constant rate. Each incoming request must first obtain a token from the bucket to be processed. If there are enough tokens in the bucket, the request is processed immediately; if there are insufficient tokens, the request is denied, queued, or subjected to other downgrade strategies.
class RateLimit {...static tokenBucket(refillRate, interval, maxTokens) {const intervalDuration = parseWindow(interval);return async (storage, identifier, delay) => {const now = Date.now();const key = `token:${identifier}`;// 1. Get the current bucket state, default to full bucketlet bucketData = storage.get(key) || {tokens: maxTokens,lastRefillAt: now,};// 2. Calculate the time passed since the last refillconst timePassed = now - bucketData.lastRefillAt;// 3. Calculate how many refills should occurconst refills = Math.floor(timePassed / intervalDuration);// 4. Refill tokens based on the number of refillslet currentTokens = bucketData.tokens;let newLastRefillAt = bucketData.lastRefillAt;if (refills > 0) {currentTokens = Math.min(maxTokens,bucketData.tokens + refills * refillRate);newLastRefillAt = bucketData.lastRefillAt + refills * intervalDuration;}// 5. If tokens are insufficient, deny the request and calculate the next available timeif (currentTokens < 1) {const timeSinceLastRefill = now - newLastRefillAt;const timeToNextRefill = intervalDuration - (timeSinceLastRefill % intervalDuration);const reset = now + timeToNextRefill;storage.set(key, {tokens: currentTokens,lastRefillAt: newLastRefillAt,});storage.pexpire(key, Math.max(intervalDuration * 5, parseWindow("1h")));return {success: false,limit: maxTokens,remaining: 0,reset,};}// 6. Sufficient tokens, consume 1const tokensLeft = currentTokens - 1;storage.set(key, {tokens: tokensLeft,lastRefillAt: newLastRefillAt,});storage.pexpire(key, Math.max(intervalDuration * 5, parseWindow("1h")));// 7. Simulate delayif (delay) {await new Promise(resolve => setTimeout(resolve, delay));}// 8. Calculate the next refill timelet reset;if (tokensLeft < maxTokens) {const timeSinceLastRefill = now - newLastRefillAt;const timeToNextRefill = intervalDuration - (timeSinceLastRefill % intervalDuration);reset = now + timeToNextRefill;} else {reset = now;}return {success: true,limit: maxTokens,remaining: tokensLeft,reset,};};}}