Writing

Rate Limiter

文章發表於

Introduction

Rate Limiter plays an indispensable role in today's software architecture. It acts as a protective barrier, preventing software services from being abused or attacked, ensuring system stability, and guaranteeing fair usage rights for all users.

Imagine a software service provider with traffic exceeding hundreds of millions, such as ChatGPT or x.com. Without the protection of a rate limiter, the system could be excessively abused by a single user or malicious program, monopolizing hardware resources or even causing a crash, thereby affecting the experience of other users. Using a rate limiter ensures that each user can fairly use resources, maintaining the overall system's performance stability and avoiding any single user monopolizing resources or suffering from malicious attacks.

The main benefits of using a rate limiter include:

  • Prevent Abuse: Malicious users may attempt to disrupt the system's normal operation through a large number of requests (e.g., DoS attacks) or API abuse (e.g., artificial flooding, excessive data scraping).
  • Resource Management: Ensures that backend services are not overloaded by a large number of requests, thereby maintaining system stability and availability, allowing all users to have a good experience.
  • Fair Usage: Ensures that all users can fairly access resources, preventing a few high-traffic users from monopolizing resources and affecting the service quality of other users.
  • Cost Control: In a cloud environment, excessive requests can lead to high operational costs; through rate limiting, these costs can be effectively controlled and managed.

Overview

After using a rate limiter, the overall user flow is as follows:

  1. Client Sends Request
    The user (via browser or mobile app) sends an HTTP request.
  2. Load Balancer Receives Request
    The load balancer performs health checks and distributes the request to available API Gateway nodes.
  3. API Gateway Forwards to Rate Limiter
    Before any backend processing, the API Gateway forwards the request to the rate limiter.
  4. Rate Limiter Checks Usage Data
    The rate limiter queries the cache system (e.g., Redis or Memcached) to obtain the current request count for the user. If there is no data in the cache, it may query the database to get rate limiting rules and update the cache.
  5. Decision: Allow or Deny
    • Allow – The rate limiter notifies the API Gateway to continue processing the request.
    • Deny – The rate limiter notifies the API Gateway to immediately return an HTTP 429 Too Many Requests error, optionally attaching a Retry-After header to inform the client when they can try again.
  6. Forward to Backend API (if allowed)
    The API Gateway forwards the request to the backend API server to execute business logic, query the database, or call other services.
  7. Return Response to Client
    The API response is returned to the client via the API Gateway and load balancer.

This article will focus on how to implement common algorithms for rate limiters and implement a simple rate limiter.

Implementation

As mentioned in previous discussions about rate limiters, the core purpose of designing a rate limiter is to prevent a single user from abusing resources, cope with malicious attacks, and simultaneously manage costs. However, how can these concepts be practically implemented in code?

Before diving into the implementation, let's think about: What elements do we need, and what information should we return?

For example, readers who have used Claude or ChatGPT may have experienced rate limiting. When your usage exceeds the limit, you might see a message like "You have exceeded the usage limit, please try again after a certain time," or when you're close to exceeding, "You have X requests left, next update at Y time."

To achieve the above effect, the backend needs to use HTTP status codes (e.g., 429 Too Many Requests) along with Headers (such as X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, or Retry-After) to inform the client whether the user is being rate-limited and the remaining quota. The client can then correctly display the corresponding message to the user by detecting the status code or reading the relevant Headers.

An ideal rate limiting response should include the following fields:

{
success: boolean, // Whether the request is allowed to proceed
limit: number, // The maximum number of requests allowed in the current time window
remaining: number, // The remaining number of available requests
reset: number, // The time point when the current time window resets (Unix timestamp)
pending: Promise // Background processing tasks (e.g., data synchronization or cache cleanup)
}

This way, the client can determine whether to display a prompt message or automatically schedule a retry based on this unified data structure.

Basic Structure

The basic functionality of a rate limiter typically includes: setting a request limit within a specific period, choosing a rate limiting algorithm, and customizing storage space. We will use the Strategy Pattern to allow developers using the package to specify the algorithm they want to use.

We will introduce the following common rate limiting algorithms:

  • Fixed Window
  • Sliding Window Log
  • Sliding Window Count
  • Token Bucket

Implementation Framework

/**
* The main class for applying rate limiting.
* Uses the Strategy Pattern, allowing different rate limiting algorithms to be passed in at runtime.
*/
class RateLimit {
/**
* @param {object} config - The configuration object for the rate limiter.
* @param {function} config.limiter - A specific rate limiting algorithm function (generated by one of the static methods).
* @param {RedisMap} [config.storage] - (Optional) An instance for storing state, defaults to a new RedisMap.
* @param {number} [config.delay] - (Optional) The number of milliseconds to simulate network delay.
*/
constructor(config) {
this.limiter = config.limiter;
this.storage = config.storage || new RedisMap();
this.delay = config.delay || 10; // Default simulated delay
}
/**
* Applies the configured rate limit to the given identifier.
* @param {string} identifier - A unique identifier to distinguish the entity being limited (e.g., user ID, IP address).
* @returns {Promise<object>} A Promise that resolves to an object containing success/failure and limit details.
* @example
* const result = await rateLimiter.limit("user-123");
* // result: { success: boolean, limit: number, remaining: number, reset: number }
*/
async limit(identifier) {
return await this.limiter(this.storage, identifier, this.delay);
}
/**
* Limits requests within fixed, non-overlapping time windows.
* @param {number} tokens - The maximum number of requests allowed per window.
* @param {string} window - The duration of the time window (e.g., "60s", "1m").
* @returns {function} A rate limiter function that can be passed into the RateLimit constructor.
*/
static fixedWindow(tokens, window) {
const windowDuration = parseWindow(window);
return async (storage, identifier, delay) => {
// Algorithm implementation...
};
}
/**
* Approximates a sliding window by weighted calculation of the current and previous fixed windows.
* @param {number} tokens - The maximum number of requests allowed per window.
* @param {string} window - The duration of the time window (e.g., "60s", "1m").
* @returns {function} A rate limiter function.
*/
static slidingWindow(tokens, window) {
const windowDuration = parseWindow(window);
return async (storage, identifier, delay) => {
// Algorithm implementation...
};
}
/**
* Allows burst requests up to the bucket's capacity and refills tokens at a fixed rate.
* @param {number} refillRate - The number of tokens to refill per interval.
* @param {string} interval - The interval for refilling tokens (e.g., "1s", "10s").
* @param {number} maxTokens - The maximum number of tokens the bucket can hold.
* @returns {function} A rate limiter function.
*/
static tokenBucket(refillRate, interval, maxTokens) {
return async (storage, identifier, delay) => {
// Algorithm implementation...
};
}
/**
* Maintains a log of request timestamps within a sliding window.
* @param {number} tokens - The maximum number of requests allowed per window.
* @param {string} window - The duration of the time window (e.g., "60s", "1m").
* @returns {function} A rate limiter function.
*/
static slidingWindowLog(tokens, window) {
return async (storage, identifier, delay) => {
// Algorithm implementation...
};
}
}

Fixed Window

The Fixed Window, as the name suggests, divides time into fixed-length "time windows" (e.g., every minute, every hour) and tracks and accumulates the number of requests received within each window. When a time window ends and a new one begins, the counter for the previous window is automatically reset to zero.

The overall process is as follows:

  1. We receive two parameters:
    • tokens: The maximum number of requests allowed in this time window.
    • window: The length of the window (in milliseconds), e.g., 1m (representing one minute).
  2. Calculate the bucket to which the current request belongs.
  3. Combine the identifier (e.g., user ID, API key) with the bucket to form a unique key, and read the currently recorded request count from the storage (Redis/Memcached).
  4. Determine if the quota is exceeded:
    • If the currently accumulated number of requests ≥ tokens, the quota is full, and the request must be denied.
    • If the currently accumulated number of requests < tokens, more requests can be allowed, so increment the bucket's counter and return a success message.
  5. Automatic Reset
    When entering a new bucket, the old bucket number is naturally phased out, and its corresponding counter is no longer used; the new bucket's counter starts accumulating from zero.

Most of the above involves fetching values to determine whether to deny or accept the request, followed by updating the values. The most challenging part is: How to efficiently track "the number of requests within a specific time window" without storing the timestamp of each request?

const bucket = Math.floor(Date.now() / windowDuration);

This way, we don't need to store the timestamp of each request but can determine the bucket to which the current request belongs by calculating the current time and the window length.

At the same time, the benefits of this approach are:

  • Global Consistency: All requests within the same time interval fall into the same bucket.
  • Automatic Expiration: When the bucket number advances, past buckets are naturally no longer used.
  • O(1) Operation: There's no need to scan the entire timestamp array; simply calculate the bucket number directly.
class RateLimit {
...
static fixedWindow(tokens, window) {
const windowDuration = parseWindow(window);
return async (storage, identifier, delay) => {
// 1. Calculate the bucket to which the current request belongs
const currentBucket = Math.floor(now / windowDuration);
const key = `${identifier}:${currentBucket}`
// 2. Read the tokens already used in this bucket (if not exists, treat as 0)
const previousTokens = storage.get(key) || 0;
const reset = (currentBucket + 1) * windowDuration;
// 3. If already full, deny
if (previousTokens > tokens) {
return { success: false, remaining: 0, limit: tokens, reset }
}
// 4. Not exceeded yet, increment and update storage
const currentTokens = previousTokens + 1;
// 5. If this is the first write to this bucket, set expiration time
if (currentTokens === 1) {
storage.pexpire(key, Math.floor(windowDuration * 1.5))
}
storage.set(key, currentTokens)
return { success: true, limit: tokens, remaining: tokens - currentTokens, reset }
};
}
...
}

Advantages:

  • Simple Implementation: Clear logic, easy to understand and implement.
  • Memory Efficient: Only needs to maintain a counter for each limited entity (e.g., user ID or IP address) within the current time window.
  • Low Computational Overhead: Counting and reset operations are usually fast.

Disadvantages:

  • Window Boundary Effect / Critical Section Problem:
    This is the main flaw of this algorithm. At the boundary of the time window, the actual rate of traffic may exceed the set average rate. For example, if the limit is 100 requests per minute, a user could send 100 requests at 0:00:59 (allowed), followed by another 100 requests at 0:01:00 (new window begins, also allowed). This results in the system processing 200 requests in just a few seconds, potentially impacting backend services.
  • Uneven Traffic: Since the counter resets at the start of each window, traffic may surge at the beginning of the window.

Sliding Window Log

The Sliding Window Log aims to address the shortcomings of the Fixed Window by allowing smooth transitions at window boundaries, avoiding sudden traffic surges at the start of a window. It records the timestamp of each request arrival. When a new request arrives, the system checks all request timestamps within the past "sliding time window" (e.g., the last 60 seconds). If the number of these timestamps exceeds the preset threshold, the new request is denied.

class RateLimit {
...
static slidingWindow(tokens, window) {
const windowDuration = parseWindow(window);
return async (storage, identifier, delay) => {
const now = Date.now();
const key = `log:${identifier}`;
// 1. Retrieve the stored timestamp array, or use an empty array if not exists
let timestamps = storage.get(key) || [];
// 2. Filter out timestamps that are "outside the window" (<= now - windowDuration)
const windowStartTime = now - windowDuration;
timestamps = timestamps.filter(ts => ts > windowStartTime);
// 3. If there are already tokens requests in this window, deny
// earliest = timestamps[0]
if (timestamps.length >= tokens) {
// Calculate the next available time: earliestTimestamp + windowDuration
const resetTime = timestamps[0] + windowDuration;
// Save the "filtered array" back and set TTL
storage.set(key, timestamps);
// Add an extra 1000 milliseconds (1 second) to ensure the key does not expire prematurely due to extreme conditions (e.g., network delay, program execution time), ensuring all requests within the window are correctly counted.
storage.pexpire(key, windowDuration + 1000);
return {
success: false,
limit: tokens,
remaining: 0,
reset: resetTime
};
}
// 4. Not yet at the limit, push now into the array and save back
timestamps.push(now);
storage.set(key, timestamps);
storage.pexpire(key, windowDuration + 1000);
// 5. Calculate remaining quota and reset time
const remaining = tokens - timestamps.length;
// Recalculate earliestTimestamp + windowDuration as reset
const reset = timestamps[0] + windowDuration;
// 6. FOR DEV PURPOSE, simulate background work with delay
if (delay) {
await new Promise(resolve => setTimeout(resolve, delay));
}
// 7. Return success message
return {
success: true,
limit: tokens,
remaining,
reset
};
};
}
...
}

Advantages:

  • Precise Traffic Control: Can very precisely limit the number of requests within any given time period, effectively avoiding the boundary effect of fixed windows.
  • Smooth Traffic: Since the window is continuously sliding, traffic control is smoother.

Disadvantages:

  • High Memory Consumption: Needs to store the timestamp of each request, which can be very memory-intensive when the number of requests is large.
  • High Computational Cost: Each request requires cleaning and counting operations on the timestamp list, which can become a performance bottleneck under high concurrency.

Sliding Window Count

The Sliding Window Count aims to combine the low memory consumption of the Fixed Window Counter with the smooth traffic control characteristics of the Sliding Window Log. It maintains request counts for the current time window and the previous time window and calculates an approximate total number of requests within a sliding window based on the current time's position within the window.

The overall process is as follows:

  • Time Slicing and Counters: Similar to the Fixed Window, time is divided into fixed-length windows, and a counter is maintained for each window.
  • Approximate Sliding Count: When a new request arrives, it considers not only the current window's request count (currentWindowCount) but also the previous window's request count (previousWindowCount).
    • Approximate Total = (previousWindowCount * overlap ratio) + currentWindowCount
    • Here, the "overlap ratio" refers to how much of the previous window still falls within the current "sliding window." For example, if the window size is 60 seconds and the current time is 1:00:15 (i.e., 15 seconds into the current window), then 45 seconds (75%) of the previous window (0:59:00-1:00:00) still falls within the past 60-second sliding window.
  • Limit Check: If this approximate total is less than the threshold, the request is allowed, and the current window's counter is incremented; otherwise, it is denied.

Example Explanation (assuming a limit of 100 requests per 60 seconds, window length of 60 seconds):

  • Window 1 (0:00:00 - 0:01:00): Received 80 requests, count_window1 = 80
  • Window 2 (0:01:00 - 0:02:00):
    • At 0:01:15 (15 seconds into the current window, 25%):
      • Previous window's contribution ratio = (60 - 15) / 60 = 45 / 60 = 75%
      • Assume Window 2 already has 10 requests (count_window2 = 10)
      • Approximate sliding total = (count_window1 × 75%) + count_window2 = (80 × 0.75) + 10 = 60 + 10 = 70
      • 70 < 100, Request Allowed
    • At 0:01:45 (45 seconds into the current window, 75%):
      • Previous window's contribution ratio = (60 - 45) / 60 = 15 / 60 = 25%
      • Assume Window 2 already has 50 requests (count_window2 = 50)
      • Approximate sliding total = (count_window1 × 25%) + count_window2 = (80 × 0.25) + 50 = 20 + 50 = 70
      • 70 < 100, Request Allowed
class RateLimit {
...
static slidingWindow(tokens, window) {
const windowDuration = parseWindow(window);
return async (storage, identifier, delay) => {
const now = Date.now();
// 1. Calculate "current" and "previous" buckets
const currentBucket = Math.floor(now / windowDuration);
const previousBucket = currentBucket - 1;
const currentKey = `${identifier}:${currentBucket}`;
const previousKey = `${identifier}:${previousBucket}`;
// 2. Get the token count for each bucket (default to 0)
const currentTokens = storage.get(currentKey) || 0;
const previousTokens = storage.get(previousKey) || 0;
// 3. Calculate the weighted token count for the previous bucket
const elapsed = now % windowDuration;
const overlapRatio = 1 - (elapsed / windowDuration);
const weightedPrevious = Math.floor(previousTokens * overlapRatio);
// 4. Calculate the total token count within the sliding window
const totalTokens = currentTokens + weightedPrevious;
const reset = (currentBucket + 1) * windowDuration;
// 5. Check if the quota is exceeded
if (totalTokens >= tokens) {
return {
success: false,
limit: tokens,
remaining: 0,
reset
};
}
// 6. Not exceeded yet, increment the current bucket's token count and update storage
const newCurrentTokens = currentTokens + 1;
storage.set(currentKey, newCurrentTokens);
// 7. If this is the first write, set TTL
if (newCurrentTokens === 1) {
// TTL set to windowDuration * 2 + 1000 to ensure the previous bucket has not expired
storage.pexpire(currentKey, Math.floor(windowDuration * 2 + 1000));
}
// 8. Calculate the new remaining count
const newTotalTokens = newCurrentTokens + weightedPrevious;
const remaining = Math.max(0, tokens - newTotalTokens);
// 9. FOR DEV PURPOSE, simulate background delay
if (delay) {
await new Promise(resolve => setTimeout(resolve, delay));
}
// 10. Return success message
return {
success: true,
limit: tokens,
remaining,
reset
};
};
}
}

Advantages:

  • Smooth Traffic: Compared to the Fixed Window Counter, the Sliding Window Count effectively reduces sudden traffic surges at window boundaries, making rate limiting smoother.
  • Memory Efficient: Only needs to store counts for the current and previous windows, requiring much less memory than the Sliding Window Log.
  • Moderate Implementation Difficulty: Simpler than the Sliding Window Log, only slightly more complex than the Fixed Window Counter.

Disadvantages:

  • Only an Approximation: Cannot be as precise as the Sliding Window Log; there may be errors under extreme traffic distributions.
  • Requires Precise Overlap Ratio Calculation: Must correctly calculate the window overlap ratio and properly maintain counts for two windows.

Token Bucket

The Token Bucket concept imagines each request as taking a token from a "token bucket." The system adds tokens to this bucket at a constant rate. Each incoming request must first obtain a token from the bucket to be processed. If there are enough tokens in the bucket, the request is processed immediately; if there are insufficient tokens, the request is denied, queued, or subjected to other downgrade strategies.

class RateLimit {
...
static tokenBucket(refillRate, interval, maxTokens) {
const intervalDuration = parseWindow(interval);
return async (storage, identifier, delay) => {
const now = Date.now();
const key = `token:${identifier}`;
// 1. Get the current bucket state, default to full bucket
let bucketData = storage.get(key) || {
tokens: maxTokens,
lastRefillAt: now,
};
// 2. Calculate the time passed since the last refill
const timePassed = now - bucketData.lastRefillAt;
// 3. Calculate how many refills should occur
const refills = Math.floor(timePassed / intervalDuration);
// 4. Refill tokens based on the number of refills
let currentTokens = bucketData.tokens;
let newLastRefillAt = bucketData.lastRefillAt;
if (refills > 0) {
currentTokens = Math.min(
maxTokens,
bucketData.tokens + refills * refillRate
);
newLastRefillAt = bucketData.lastRefillAt + refills * intervalDuration;
}
// 5. If tokens are insufficient, deny the request and calculate the next available time
if (currentTokens < 1) {
const timeSinceLastRefill = now - newLastRefillAt;
const timeToNextRefill = intervalDuration - (timeSinceLastRefill % intervalDuration);
const reset = now + timeToNextRefill;
storage.set(key, {
tokens: currentTokens,
lastRefillAt: newLastRefillAt,
});
storage.pexpire(key, Math.max(intervalDuration * 5, parseWindow("1h")));
return {
success: false,
limit: maxTokens,
remaining: 0,
reset,
};
}
// 6. Sufficient tokens, consume 1
const tokensLeft = currentTokens - 1;
storage.set(key, {
tokens: tokensLeft,
lastRefillAt: newLastRefillAt,
});
storage.pexpire(key, Math.max(intervalDuration * 5, parseWindow("1h")));
// 7. Simulate delay
if (delay) {
await new Promise(resolve => setTimeout(resolve, delay));
}
// 8. Calculate the next refill time
let reset;
if (tokensLeft < maxTokens) {
const timeSinceLastRefill = now - newLastRefillAt;
const timeToNextRefill = intervalDuration - (timeSinceLastRefill % intervalDuration);
reset = now + timeToNextRefill;
} else {
reset = now;
}
return {
success: true,
limit: maxTokens,
remaining: tokensLeft,
reset,
};
};
}
}

Algorithm Comparison

AlgorithmAdvantagesDisadvantagesBest Use Cases
Fixed Window CounterSimple to implement, memory efficient, fast computationBoundary effect, traffic may double the limitRough limits, e.g., "5 emails per hour"
Sliding Window LogMost precise control, completely avoids boundary issuesHigh memory consumption, high computational costFinancial transactions, critical APIs, strict contract scenarios
Sliding Window CounterSmooth traffic, good balance of performance and precisionApproximation, slightly complex implementationGeneral solution for most web applications
Token BucketHandles burst traffic, flexible rate controlComplex parameter adjustment, complex implementationAllows burst operations, API gateways
If you enjoyed this article, please click the buttons below to share it with more people. Your support means a lot to me as a writer.
Buy me a coffee