SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure
Apurv Deepak Kulkarni, Siavash Ghiasvand
TL;DR
SProBench addresses the challenge of benchmarking data stream processing frameworks on HPC infrastructure, where existing tools struggle with scalability and SLURM integration. It delivers a modular benchmark with a workload generator, message broker, and framework-agnostic processing pipelines, plus automated experiment workflow and comprehensive metric collection for interoperability with Flink, Spark Streaming, and Kafka Streams. Experimental results on a large HPC cluster demonstrate superior scalability, with the workload generator achieving over 20 million events per second and parallel throughput surpassing prior benchmarks by more than tenfold, while revealing tradeoffs between throughput and latency at high parallelism. The work provides an open-source, end-to-end solution for reproducible performance studies of DSP frameworks on HPC systems and lays the groundwork for broader framework support and workloads.
Abstract
Recent advancements in data stream processing frameworks have improved real-time data handling, however, scalability remains a significant challenge affecting throughput and latency. While studies have explored this issue on local machines and cloud clusters, research on modern high performance computing (HPC) infrastructures is yet limited due to the lack of scalable measurement tools. This work presents SProBench, a novel benchmark suite designed to evaluate the performance of data stream processing frameworks in large-scale computing systems. Building on best practices, SProBench incorporates a modular architecture, offers native support for SLURM-based clusters, and seamlessly integrates with popular stream processing frameworks such as Apache Flink, Apache Spark Streaming, and Apache Kafka Streams. Experiments conducted on HPC clusters demonstrate its exceptional scalability, delivering throughput that surpasses existing benchmarks by more than tenfold. The distinctive features of SProBench, including complete customization options, built-in automated experiment management tools, seamless interoperability, and an open-source license, distinguish it as an innovative benchmark suite tailored to meet the needs of modern data stream processing frameworks.
