Designing Scalable Rate Limiting Systems: Algorithms, Architecture, and Distributed Solutions
Bo Guan
TL;DR
The paper addresses designing scalable, accurate rate limiting in distributed systems under CAP constraints. It advocates a Rolling Window algorithm implemented with Redis Sorted Sets and server-side Lua scripting, deployed on Redis Cluster in AP mode to maximize availability and partition tolerance. It quantifies memory-accuracy trade-offs, showing Rolling Window achieves precise enforcement with a memory cost of $8$ bytes per request and $O(log(N))$ operation complexity, while Token Bucket and Fixed Window have different trade-offs. A three-layer rule-management architecture enables dynamic rule updates without script redeployment, delivering a production-ready blueprint for high-volume API throttling.
Abstract
Designing a rate limiter that is simultaneously accurate, available, and scalable presents a fundamental challenge in distributed systems, primarily due to the trade-offs between algorithmic precision, availability, consistency, and partition tolerance. This article presents a concrete architecture for a distributed rate limiting system in a production-grade environment. Our design chooses the in-memory cache database, the Redis, along with its Sorted Set data structure, which provides $O(log (N))$ time complexity operation for the key-value pair dataset with efficiency and low latency, and maintains precision. The core contribution is quantifying the accuracy and memory cost trade-off of the chosen Rolling Window as the implemented rate limiting algorithm against the Token Bucket and Fixed Window algorithms. In addition, we explain how server-side Lua scripting is critical to bundling cleanup, counting, and insertion into a single atomic operation, thereby eliminating race conditions in concurrent environments. In the system architecture, we propose a three-layer architecture that manages the storage and updating of the limit rules. Through script load by hashing the rule parameters, rules can be changed without modifying the cached scripts. Furthermore, we analyze the deployment of this architecture on a Redis Cluster, which provides the availability and scalability by data sharding and replication. We explain the acceptance of AP (Availability and Partition Tolerance) from the CAP theorem as the pragmatic engineering trade-off for this use case.
