Diba: A Re-configurable Stream Processor
Mohammadreza Najafi, Thamir M. Qadah, Mohammad Sadoghi, Hans-Arno Jacobsen
TL;DR
Diba addresses the challenge of real-time stream processing on hardware by introducing a reconfigurable, unidirectional dataflow processor that separates the data-distribution network from processing blocks. Its NoC-based modular architecture, Topology Bricks, and a versatile library of PUnits and specialized blocks enable online reconfiguration and concurrent execution of multiple stream queries. The paper contributes a detailed component design (GSwitch, LSwitch, OP-Block, HB-SJ, Circular-MJ, Aggregation-GroupBy) and demonstrates a FPGA prototype achieving practical throughput (e.g., 300–3520 ms for TPC-H Q3-like workloads across SF=1–10 GB) with favorable power characteristics. This work shows that a carefully crafted hardware-software co-design can deliver scalable, reusable building blocks for diverse streaming workloads, potentially shortening time-to-market for hardware-accelerated data processing.
Abstract
Stream processing acceleration is driven by the continuously increasing volume and velocity of data generated on the Web and the limitations of storage, computation, and power consumption. Hardware solutions provide better performance and power consumption, but they are hindered by the high research and development costs and the long time to market. In this work, we propose our re-configurable stream processor (Diba), a complete rethinking of a previously proposed customized and flexible query processor that targets real-time stream processing. Diba uses a unidirectional dataflow not dedicated to any specific type of query (operator) on streams, allowing a straightforward placement of processing components on a general data path that facilitates query mapping. In Diba, the concepts of the distribution network and processing components are implemented as two separate entities connected using generic interfaces. This approach allows the adoption of a versatile architecture for a family of queries rather than forcing a rigid chain of processing components to implement such queries. Our experimental evaluations of representative queries from TPC-H yielded processing times of 300, 1220, and 3520 milliseconds for data streams with scale factor sizes of one, four, and ten gigabytes, respectively.
