Table of Contents
Fetching ...

Counter Pools: Counter Representation for Efficient Stream Processing

Ran Ben Basat, Gil Einziger, Bilal Tyah, Shay Vargaftik

TL;DR

Counter Pools address the memory bottleneck of counter arrays in stream processing by introducing fixed-size pools that hold multiple variable-sized counters. The approach hinges on a stars-and-bars encoding to map per-pool counter sizes to a compact configuration number, enabling efficient dynamic resizing and, when needed, graceful pool-failure handling. The paper demonstrates strong improvements in space-accuracy tradeoffs for sketches and enables faster exact histogram counting by reducing load factors, supported by a thorough evaluation against state-of-the-art methods. This technique is particularly impactful for heavy-tailed network workloads, offering substantial memory savings and accuracy improvements with practical deployment considerations.

Abstract

Due to the large data volume and number of distinct elements, space is often the bottleneck of many stream processing systems. The data structures used by these systems often consist of counters whose optimization yields significant memory savings. The challenge lies in balancing the size of the counters: too small, and they overflow; too large, and memory capacity limits their number. In this work, we suggest an efficient encoding scheme that sizes each counter according to its needs. Our approach uses fixed-sized pools of memory (e.g., a single memory word or 64 bits), where each pool manages a small number of counters. We pay special attention to performance and demonstrate considerable improvements for various streaming algorithms and workload characteristics.

Counter Pools: Counter Representation for Efficient Stream Processing

TL;DR

Counter Pools address the memory bottleneck of counter arrays in stream processing by introducing fixed-size pools that hold multiple variable-sized counters. The approach hinges on a stars-and-bars encoding to map per-pool counter sizes to a compact configuration number, enabling efficient dynamic resizing and, when needed, graceful pool-failure handling. The paper demonstrates strong improvements in space-accuracy tradeoffs for sketches and enables faster exact histogram counting by reducing load factors, supported by a thorough evaluation against state-of-the-art methods. This technique is particularly impactful for heavy-tailed network workloads, offering substantial memory savings and accuracy improvements with practical deployment considerations.

Abstract

Due to the large data volume and number of distinct elements, space is often the bottleneck of many stream processing systems. The data structures used by these systems often consist of counters whose optimization yields significant memory savings. The challenge lies in balancing the size of the counters: too small, and they overflow; too large, and memory capacity limits their number. In this work, we suggest an efficient encoding scheme that sizes each counter according to its needs. Our approach uses fixed-sized pools of memory (e.g., a single memory word or 64 bits), where each pool manages a small number of counters. We pay special attention to performance and demonstrate considerable improvements for various streaming algorithms and workload characteristics.

Paper Structure

This paper contains 24 sections, 10 figures, 3 tables, 6 algorithms.

Figures (10)

  • Figure 1: Distribution of required counter sizes for the flows in the NYC 2018 CAIDA2018 workload for one exact counter per flow and a 2MB count min sketch. Notice that the sketch counters are slightly bigger due to hash collisions, but the general trend is very similar. In both cases, there is potential for memory savings by a counter-resizing scheme.
  • Figure 2: Basic Layout of the counters and an increment operation.
  • Figure 3: Optimization: placing unallocated bits at the leftmost counter.
  • Figure 4: Comparing the On Arrival Error for different Counter Pools configurations.
  • Figure 5: Comparing the ARE for heavy hitters of the different configurations with 200Kb and 2Mb memory.
  • ...and 5 more figures