Table of Contents
Fetching ...

SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure

Apurv Deepak Kulkarni, Siavash Ghiasvand

TL;DR

SProBench addresses the challenge of benchmarking data stream processing frameworks on HPC infrastructure, where existing tools struggle with scalability and SLURM integration. It delivers a modular benchmark with a workload generator, message broker, and framework-agnostic processing pipelines, plus automated experiment workflow and comprehensive metric collection for interoperability with Flink, Spark Streaming, and Kafka Streams. Experimental results on a large HPC cluster demonstrate superior scalability, with the workload generator achieving over 20 million events per second and parallel throughput surpassing prior benchmarks by more than tenfold, while revealing tradeoffs between throughput and latency at high parallelism. The work provides an open-source, end-to-end solution for reproducible performance studies of DSP frameworks on HPC systems and lays the groundwork for broader framework support and workloads.

Abstract

Recent advancements in data stream processing frameworks have improved real-time data handling, however, scalability remains a significant challenge affecting throughput and latency. While studies have explored this issue on local machines and cloud clusters, research on modern high performance computing (HPC) infrastructures is yet limited due to the lack of scalable measurement tools. This work presents SProBench, a novel benchmark suite designed to evaluate the performance of data stream processing frameworks in large-scale computing systems. Building on best practices, SProBench incorporates a modular architecture, offers native support for SLURM-based clusters, and seamlessly integrates with popular stream processing frameworks such as Apache Flink, Apache Spark Streaming, and Apache Kafka Streams. Experiments conducted on HPC clusters demonstrate its exceptional scalability, delivering throughput that surpasses existing benchmarks by more than tenfold. The distinctive features of SProBench, including complete customization options, built-in automated experiment management tools, seamless interoperability, and an open-source license, distinguish it as an innovative benchmark suite tailored to meet the needs of modern data stream processing frameworks.

SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure

TL;DR

SProBench addresses the challenge of benchmarking data stream processing frameworks on HPC infrastructure, where existing tools struggle with scalability and SLURM integration. It delivers a modular benchmark with a workload generator, message broker, and framework-agnostic processing pipelines, plus automated experiment workflow and comprehensive metric collection for interoperability with Flink, Spark Streaming, and Kafka Streams. Experimental results on a large HPC cluster demonstrate superior scalability, with the workload generator achieving over 20 million events per second and parallel throughput surpassing prior benchmarks by more than tenfold, while revealing tradeoffs between throughput and latency at high parallelism. The work provides an open-source, end-to-end solution for reproducible performance studies of DSP frameworks on HPC systems and lays the groundwork for broader framework support and workloads.

Abstract

Recent advancements in data stream processing frameworks have improved real-time data handling, however, scalability remains a significant challenge affecting throughput and latency. While studies have explored this issue on local machines and cloud clusters, research on modern high performance computing (HPC) infrastructures is yet limited due to the lack of scalable measurement tools. This work presents SProBench, a novel benchmark suite designed to evaluate the performance of data stream processing frameworks in large-scale computing systems. Building on best practices, SProBench incorporates a modular architecture, offers native support for SLURM-based clusters, and seamlessly integrates with popular stream processing frameworks such as Apache Flink, Apache Spark Streaming, and Apache Kafka Streams. Experiments conducted on HPC clusters demonstrate its exceptional scalability, delivering throughput that surpasses existing benchmarks by more than tenfold. The distinctive features of SProBench, including complete customization options, built-in automated experiment management tools, seamless interoperability, and an open-source license, distinguish it as an innovative benchmark suite tailored to meet the needs of modern data stream processing frameworks.

Paper Structure

This paper contains 10 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Benchmark architecture
  • Figure 2: Benchmark process setup for scale-up and scale-out experimentation
  • Figure 3: Benchmark Workflow
  • Figure 4: Processing Pipline
  • Figure 5: Metrics Monitoring and Collection
  • ...and 3 more figures