Scalable data concentrator with baseline interconnection network for triggerless data acquisition systems
Wojciech M. Zabołotny
TL;DR
The paper tackles the challenge of triggerless DAQ data concentration, where multiple short input words must be packed into wide PCIe-format words while filtering non-DAQ data and preserving temporal order. It introduces the BNRO (baseline network with reversed outputs) interconnection-network approach, provides formal proofs of correctness via mathematical induction and collision analysis, and presents a scalable, parameterized VHDL implementation with an efficient controller. The solution achieves hardware-validated concentration up to 4 layers (16 inputs) and simulations for 5 layers (32 inputs), with hardware tests demonstrating 128 Gb/s throughput at 250 MHz for 16 inputs on FPGA boards. Open-sourced and designed to scale, BNRO offers a practical pathway to high-speed, order-preserving data concentration for triggerless DAQ systems, potentially extending to wider word lengths and larger input counts in future PCIe generations.
Abstract
Triggerless Data Acquisition Systems (DAQs) require transmitting the data stream from multiple links to the processing node. The short input data words must be concentrated and packed into the longer bit vectors the output interface (e.g., PCI Express) uses. In that process, the unneeded data must be eliminated, and a dense stream of useful DAQ data must be created. Additionally, the time order of the data should be preserved. This paper presents a new solution using the Baseline Network with Reversed Outputs (BNRO) for high-speed data routing. A thorough analysis of the network's operation enabled increased scalability compared to the previously published concentrator based on an 8x8 network. The solution may be scaled by adding additional layers to the BNRO network while minimizing resource consumption. Simulations were done for 4 and 5 layers (16 and 32 inputs). The FPGA implementation and tests in the actual hardware have been successfully performed for 16 inputs. The pipeline registers may be added in each layer independently, shortening the critical path and increasing the maximum acceptable clock frequency.
