Towards Fine-Grained Scalability for Stateful Stream Processing Systems
Yunfan Qing, Wenli Zheng
TL;DR
This work tackles the latency and instability challenges of on-the-fly scaling in stateful stream processing by introducing DRRS, a plug-in framework for Apache Flink that combines three mechanisms—Decoupling and Re-routing, Record Scheduling, and Subscale Division—to achieve fine-grained, schedulable scaling. By separating propagation from migration, proactively adjusting record execution order, and partitioning state migrations into independent subscales, DRRS minimizes propagation, suspension, and dependency overhead. Empirical results show substantial reductions in peak and average latency (up to 81.1% and 95.5%) and significantly faster scaling (up to 86% faster) across synthetic and real workloads, with robust performance under varying state sizes and skew. The approach enables scalable, low-latency stateful streaming in production, while maintaining compatibility with fault-tolerance mechanisms and without requiring user code changes.
Abstract
Dynamic scaling is critical to stream processing engines, as their long-running nature demands adaptive resource management. Existing scaling approaches easily cause performance degradation due to coarse-grained synchronization and inefficient state migration, resulting in system halt or high processing latency. In this paper, we propose DRRS, an on-the-fly scaling method that reduces performance overhead at the system level with three key innovations: (i) fine-grained scaling signals coupled with a re-routing mechanism that significantly mitigates propagation delay, (ii) a sophisticated record-scheduling mechanism that substantially reduces processing suspension, and (iii) subscale division, a mechanism that partitions migrating states into independent subsets, thereby reducing dependency-related overhead to enable finer-grained control and better runtime adaptability during scaling. DRRS is implemented on Apache Flink and, when compared to state-of-the-art approaches, reduces peak and average latencies by up to 81.1% and 95.5% respectively, while achieving a 72.8%-86% reduction in scaling duration, without disruption in non-scaling periods.
