Table of Contents
Fetching ...

Towards Fine-Grained Scalability for Stateful Stream Processing Systems

Yunfan Qing, Wenli Zheng

TL;DR

This work tackles the latency and instability challenges of on-the-fly scaling in stateful stream processing by introducing DRRS, a plug-in framework for Apache Flink that combines three mechanisms—Decoupling and Re-routing, Record Scheduling, and Subscale Division—to achieve fine-grained, schedulable scaling. By separating propagation from migration, proactively adjusting record execution order, and partitioning state migrations into independent subscales, DRRS minimizes propagation, suspension, and dependency overhead. Empirical results show substantial reductions in peak and average latency (up to 81.1% and 95.5%) and significantly faster scaling (up to 86% faster) across synthetic and real workloads, with robust performance under varying state sizes and skew. The approach enables scalable, low-latency stateful streaming in production, while maintaining compatibility with fault-tolerance mechanisms and without requiring user code changes.

Abstract

Dynamic scaling is critical to stream processing engines, as their long-running nature demands adaptive resource management. Existing scaling approaches easily cause performance degradation due to coarse-grained synchronization and inefficient state migration, resulting in system halt or high processing latency. In this paper, we propose DRRS, an on-the-fly scaling method that reduces performance overhead at the system level with three key innovations: (i) fine-grained scaling signals coupled with a re-routing mechanism that significantly mitigates propagation delay, (ii) a sophisticated record-scheduling mechanism that substantially reduces processing suspension, and (iii) subscale division, a mechanism that partitions migrating states into independent subsets, thereby reducing dependency-related overhead to enable finer-grained control and better runtime adaptability during scaling. DRRS is implemented on Apache Flink and, when compared to state-of-the-art approaches, reduces peak and average latencies by up to 81.1% and 95.5% respectively, while achieving a 72.8%-86% reduction in scaling duration, without disruption in non-scaling periods.

Towards Fine-Grained Scalability for Stateful Stream Processing Systems

TL;DR

This work tackles the latency and instability challenges of on-the-fly scaling in stateful stream processing by introducing DRRS, a plug-in framework for Apache Flink that combines three mechanisms—Decoupling and Re-routing, Record Scheduling, and Subscale Division—to achieve fine-grained, schedulable scaling. By separating propagation from migration, proactively adjusting record execution order, and partitioning state migrations into independent subscales, DRRS minimizes propagation, suspension, and dependency overhead. Empirical results show substantial reductions in peak and average latency (up to 81.1% and 95.5%) and significantly faster scaling (up to 86% faster) across synthetic and real workloads, with robust performance under varying state sizes and skew. The approach enables scalable, low-latency stateful streaming in production, while maintaining compatibility with fault-tolerance mechanisms and without requiring user code changes.

Abstract

Dynamic scaling is critical to stream processing engines, as their long-running nature demands adaptive resource management. Existing scaling approaches easily cause performance degradation due to coarse-grained synchronization and inefficient state migration, resulting in system halt or high processing latency. In this paper, we propose DRRS, an on-the-fly scaling method that reduces performance overhead at the system level with three key innovations: (i) fine-grained scaling signals coupled with a re-routing mechanism that significantly mitigates propagation delay, (ii) a sophisticated record-scheduling mechanism that substantially reduces processing suspension, and (iii) subscale division, a mechanism that partitions migrating states into independent subsets, thereby reducing dependency-related overhead to enable finer-grained control and better runtime adaptability during scaling. DRRS is implemented on Apache Flink and, when compared to state-of-the-art approaches, reduces peak and average latencies by up to 81.1% and 95.5% respectively, while achieving a 72.8%-86% reduction in scaling duration, without disruption in non-scaling periods.

Paper Structure

This paper contains 23 sections, 15 figures.

Figures (15)

  • Figure 1: Illustration of on-the-fly scaling (OTFS).
  • Figure 2: The latency over time for: Unbound, OTFS (generalized on-the-fly scaling with fluid migration), and No Scale, measured using the experimental settings in Section V-A with the Twitch workload under a fixed input rate.
  • Figure 3: Synergy between DRRS Mechanisms.
  • Figure 4: Illustration of the Decoupling and Re-routing mechanism.
  • Figure 5: Illustration of streams under Coupled vs. Decoupled Signal Synchronization. The top half depicts synchronization with a conventional coupled signal, while the bottom demonstrates decoupling and re-routing, organizing affected records into two epochs ($E_f$ and $E_p$).
  • ...and 10 more figures