Hardware-Conscious Stream Processing: A Survey
Shuhao Zhang, Feng Zhang, Yingjun Wu, Bingsheng He, Paul Johns
TL;DR
Hardware-conscious stream processing targets exploiting modern hardware to boost latency and throughput in DSPSs. The paper surveys computation optimization, stream I/O optimization, and query deployment across multicore CPUs, GPUs, and FPGAs, detailing techniques such as incremental window aggregation, out-of-order handling, cross-operator data grouping, and accelerator-backed windowing. Notable systems and results are discussed (e.g., SABER achieving $79$ million tuples per second on Yahoo Streaming Benchmark with eight CPU cores), alongside trade-offs between CO and BSP models and between memory hierarchies like $HBM$ and $NVM$. The findings highlight significant progress yet identify open questions in SQL-on-streams, data safety, and scalable, architecture-aware query deployment, outlining directions for next-generation DSPS design.
Abstract
Data stream processing systems (DSPSs) enable users to express and run stream applications to continuously process data streams. To achieve real-time data analytics, recent researches keep focusing on optimizing the system latency and throughput. Witnessing the recent great achievements in the computer architecture community, researchers and practitioners have investigated the potential of adoption hardware-conscious stream processing by better utilizing modern hardware capacity in DSPSs. In this paper, we conduct a systematic survey of recent work in the field, particularly along with the following three directions: 1) computation optimization, 2) stream I/O optimization, and 3) query deployment. Finally, we advise on potential future research directions.
