Table of Contents
Fetching ...

Holon Streaming: Global Aggregations with Windowed CRDTs

Jonas Spenger, Kolya Krafeld, Ruben van Gemeren, Philipp Haller, Paris Carbone

TL;DR

Holon Streaming tackles the bottleneck of global aggregations in exactly-once streaming by replacing centralized coordination with decentralized state synchronization using Windowed CRDTs. The approach delivers a deterministic programming model with convergence guarantees, enabling scalable and low-latency global computations. Empirical evaluation on Nexmark workloads demonstrates substantial gains over Apache Flink, including higher throughput and lower latency, and markedly improved resilience under failure. Overall, the work shows that combining Windowed CRDTs with decentralized coordination is a practical path to scalable global state in streaming systems.

Abstract

Scaling global aggregations is a challenge for exactly-once stream processing systems. Current systems implement these either by computing the aggregation in a single task instance, or by static aggregation trees, which limits scalability and may become a bottleneck. Moreover, the end-to-end latency is determined by the slowest path in the tree, and failures and reconfiguration cause large latency spikes due to the centralized coordination. Towards these issues, we present Holon Streaming, an exactly-once stream processing system for global aggregations. Its deterministic programming model uses windowed conflict-free replicated data types (Windowed CRDTs), a novel abstraction for shared replicated state. Windowed CRDTs make computing global aggregations scalable. Furthermore, their guarantees such as determinism and convergence enable the design of efficient failure recovery algorithms by decentralized coordination. Our evaluation shows a 5x lower latency and 2x higher throughput than an existing stream processing system on global aggregation workloads, with an 11x latency reduction under failure scenarios. The paper demonstrates the effectiveness of decentralized coordination with determinism, and the utility of Windowed CRDTs for global aggregations.

Holon Streaming: Global Aggregations with Windowed CRDTs

TL;DR

Holon Streaming tackles the bottleneck of global aggregations in exactly-once streaming by replacing centralized coordination with decentralized state synchronization using Windowed CRDTs. The approach delivers a deterministic programming model with convergence guarantees, enabling scalable and low-latency global computations. Empirical evaluation on Nexmark workloads demonstrates substantial gains over Apache Flink, including higher throughput and lower latency, and markedly improved resilience under failure. Overall, the work shows that combining Windowed CRDTs with decentralized coordination is a practical path to scalable global state in streaming systems.

Abstract

Scaling global aggregations is a challenge for exactly-once stream processing systems. Current systems implement these either by computing the aggregation in a single task instance, or by static aggregation trees, which limits scalability and may become a bottleneck. Moreover, the end-to-end latency is determined by the slowest path in the tree, and failures and reconfiguration cause large latency spikes due to the centralized coordination. Towards these issues, we present Holon Streaming, an exactly-once stream processing system for global aggregations. Its deterministic programming model uses windowed conflict-free replicated data types (Windowed CRDTs), a novel abstraction for shared replicated state. Windowed CRDTs make computing global aggregations scalable. Furthermore, their guarantees such as determinism and convergence enable the design of efficient failure recovery algorithms by decentralized coordination. Our evaluation shows a 5x lower latency and 2x higher throughput than an existing stream processing system on global aggregation workloads, with an 11x latency reduction under failure scenarios. The paper demonstrates the effectiveness of decentralized coordination with determinism, and the utility of Windowed CRDTs for global aggregations.

Paper Structure

This paper contains 22 sections, 9 figures, 2 tables, 2 algorithms.

Figures (9)

  • Figure 1: Query 1 physical dataflow execution plan.
  • Figure 2: Query 1 in the dataflow API.
  • Figure 3: Windowing updates for the GCounter.
  • Figure 4: Holon Streaming deployment.
  • Figure 5: Holon Streaming node.
  • ...and 4 more figures