Table of Contents
Fetching ...

Scalable data concentrator with baseline interconnection network for triggerless data acquisition systems

Wojciech M. Zabołotny

TL;DR

The paper tackles the challenge of triggerless DAQ data concentration, where multiple short input words must be packed into wide PCIe-format words while filtering non-DAQ data and preserving temporal order. It introduces the BNRO (baseline network with reversed outputs) interconnection-network approach, provides formal proofs of correctness via mathematical induction and collision analysis, and presents a scalable, parameterized VHDL implementation with an efficient controller. The solution achieves hardware-validated concentration up to 4 layers (16 inputs) and simulations for 5 layers (32 inputs), with hardware tests demonstrating 128 Gb/s throughput at 250 MHz for 16 inputs on FPGA boards. Open-sourced and designed to scale, BNRO offers a practical pathway to high-speed, order-preserving data concentration for triggerless DAQ systems, potentially extending to wider word lengths and larger input counts in future PCIe generations.

Abstract

Triggerless Data Acquisition Systems (DAQs) require transmitting the data stream from multiple links to the processing node. The short input data words must be concentrated and packed into the longer bit vectors the output interface (e.g., PCI Express) uses. In that process, the unneeded data must be eliminated, and a dense stream of useful DAQ data must be created. Additionally, the time order of the data should be preserved. This paper presents a new solution using the Baseline Network with Reversed Outputs (BNRO) for high-speed data routing. A thorough analysis of the network's operation enabled increased scalability compared to the previously published concentrator based on an 8x8 network. The solution may be scaled by adding additional layers to the BNRO network while minimizing resource consumption. Simulations were done for 4 and 5 layers (16 and 32 inputs). The FPGA implementation and tests in the actual hardware have been successfully performed for 16 inputs. The pipeline registers may be added in each layer independently, shortening the critical path and increasing the maximum acceptable clock frequency.

Scalable data concentrator with baseline interconnection network for triggerless data acquisition systems

TL;DR

The paper tackles the challenge of triggerless DAQ data concentration, where multiple short input words must be packed into wide PCIe-format words while filtering non-DAQ data and preserving temporal order. It introduces the BNRO (baseline network with reversed outputs) interconnection-network approach, provides formal proofs of correctness via mathematical induction and collision analysis, and presents a scalable, parameterized VHDL implementation with an efficient controller. The solution achieves hardware-validated concentration up to 4 layers (16 inputs) and simulations for 5 layers (32 inputs), with hardware tests demonstrating 128 Gb/s throughput at 250 MHz for 16 inputs on FPGA boards. Open-sourced and designed to scale, BNRO offers a practical pathway to high-speed, order-preserving data concentration for triggerless DAQ systems, potentially extending to wider word lengths and larger input counts in future PCIe generations.

Abstract

Triggerless Data Acquisition Systems (DAQs) require transmitting the data stream from multiple links to the processing node. The short input data words must be concentrated and packed into the longer bit vectors the output interface (e.g., PCI Express) uses. In that process, the unneeded data must be eliminated, and a dense stream of useful DAQ data must be created. Additionally, the time order of the data should be preserved. This paper presents a new solution using the Baseline Network with Reversed Outputs (BNRO) for high-speed data routing. A thorough analysis of the network's operation enabled increased scalability compared to the previously published concentrator based on an 8x8 network. The solution may be scaled by adding additional layers to the BNRO network while minimizing resource consumption. Simulations were done for 4 and 5 layers (16 and 32 inputs). The FPGA implementation and tests in the actual hardware have been successfully performed for 16 inputs. The pipeline registers may be added in each layer independently, shortening the critical path and increasing the maximum acceptable clock frequency.
Paper Structure (11 sections, 18 figures, 2 tables)

This paper contains 11 sections, 18 figures, 2 tables.

Figures (18)

  • Figure S1: Structure of an 8-to-1 encoder built from 2-to-1 encoders. Figure reproduced (redrawn) and caption copied from bassi_fpga-based_2023: "A FPGA-Based Architecture for Real-Time Cluster Finding in the LHCb Silicon Pixel Detector," by G. Bassi, L. Giambastiani, K. Hennessy, F. Lazzari, M. J. Morello, T. Pajero, A. Fernandez Prieto, G. Punzi, in IEEE Transactions on Nuclear Science, vol. 70, no. 6, pp. 1189-1201, June 2023, doi: 10.1109/TNS.2023.3273600, CC BY.
  • Figure S2: 2-to-1 encoder block diagram. R0, R1, R3, and State are registers, MUX0, MUX1, and MUX3 are multiplexers, and FSM is a finite state machine that manages hold, valid, and LE write signals. Figure reproduced (redrawn) and caption copied from bassi_fpga-based_2023: "A FPGA-Based Architecture for Real-Time Cluster Finding in the LHCb Silicon Pixel Detector," by G. Bassi, L. Giambastiani, K. Hennessy, F. Lazzari, M. J. Morello, T. Pajero, A. Fernandez Prieto, G. Punzi, in IEEE Transactions on Nuclear Science, vol. 70, no. 6, pp. 1189-1201, June 2023, doi: 10.1109/TNS.2023.3273600, CC BY.
  • Figure S3: Example of the concentration of data from 8 inputs to 8 outputs. The "DAQ" words are denoted by "D" with the number in subscript. The "Non-DAQ" words are denoted by "N". (a) In the first concentration cycle, five DAQ words are delivered via links. They are stored in the output record in locations 0 to 4. (b) In the second concentration cycle, the next 6 DAQ words are delivered. Three of them are stored in the output record in locations 5 to 7, but the next three must be stored elsewhere, as the locations 0 to 2 in the output record are still occupied. For that purpose, the auxiliary record is provided. The output record is completed and ready for sending to DAQ. (c) In the third concentration cycle, another six DAQ words are provided. Three DAQ words from the auxiliary record are copied to the output record. Therefore, filling the output records starts from location 3, and only 5 DAQ words may be stored. The sixth word must be stored again in the auxiliary record.
  • Figure S4: Switch used in the concentrating network. Figure based on chuan-lin_wu_class_1980.
  • Figure S5: Concentration of data from 8 inputs. Slightly modified figure from guminski_benes_2023, according to CC BY license.
  • ...and 13 more figures