A 28nm Multiply-Accumulate ASIC Architecture for On-Chip Data Compression in MHz Frame Rate X-ray and Electron Pixel Detectors

Rami Rasheedi; Nicholas Contini; Mohamed Adel Gharib; Sebastian Strempfer; Senthil Gnanasekaran; Salma Abdelzaher; Tejas Guruswamy; Kazutomo Yoshii; Mike Hammer; Henry Shi; Yu-Sheng Chen; Lorenzo Rota; Dionisio Doering; Angelo Dragone; Tao Zhou; Antonino Miceli

A 28nm Multiply-Accumulate ASIC Architecture for On-Chip Data Compression in MHz Frame Rate X-ray and Electron Pixel Detectors

Rami Rasheedi, Nicholas Contini, Mohamed Adel Gharib, Sebastian Strempfer, Senthil Gnanasekaran, Salma Abdelzaher, Tejas Guruswamy, Kazutomo Yoshii, Mike Hammer, Henry Shi, Yu-Sheng Chen, Lorenzo Rota, Dionisio Doering, Angelo Dragone, Tao Zhou, Antonino Miceli

TL;DR

This work tackles the bandwidth bottleneck in high-throughput X-ray/electron detectors by designing an on-chip, fixed-length, lossy compression ASIC implemented in 28 nm CMOS. It uses PCA/SVD-based encoding with pre-generated weights stored in SRAM and performs online matrix multiplication to produce a fixed-length output vector of length $K$, enabling real-time compression at frame rates in the MHz regime for a $192 × 168$ pixel array at 12-bit depth. The paper details a modular hardware architecture with blocks for addressing, weight storage, FP multiply-accumulate, and accumulation, and it explores design optimizations such as pipelining, logic sharing, and data-width tuning (e.g., FP12 weights with FP16 multipliers and FP17 accumulators) to meet timing, area, and power constraints; synthesis and physical implementation results demonstrate feasibility, with SRAM-dominated area and sub-1 W/mm$^2$ power densities reported in optimized designs. The findings indicate that online, fixed-length compression is viable for scaling to multi-ASIC systems and high-brightness sources, enabling on-chip data reduction and opening paths toward broader applications like azimuthal integration and neural-network inference at the detector.”

Abstract

Modern X-ray detector systems urgently require compact, efficient, and fast data compression schemes to handle the transmission of big data from pixel arrays, enabling frame rates in the MHz regime. In this work, a data compression ASIC that implements a streaming fixed-length lossy compression scheme is introduced and analyzed, proving the feasibility and benefits of on-chip compression. The compression scheme utilizes a vector matrix product logic, which performs a number of floating-point multiplications, additions, and accumulations. The logic is verified, synthesized, and shown to fit in the area resource available for the X-ray detector under study, which comprises 192 x 168 pixels each of 12-bit width, and having a total area of 20 mm x 20 mm, about 2 mm x 20 mm of which are available for the digital logic. Several system architectures, precisions, and compression ratios ranging from 100 to 250 were analyzed to pave the way for on-chip fixed-length compression (e.g., principal component analysis, singular value decomposition) and data reduction (e.g., azimuthal integration) for X-ray and electron detectors.

A 28nm Multiply-Accumulate ASIC Architecture for On-Chip Data Compression in MHz Frame Rate X-ray and Electron Pixel Detectors

TL;DR

Abstract

A 28nm Multiply-Accumulate ASIC Architecture for On-Chip Data Compression in MHz Frame Rate X-ray and Electron Pixel Detectors

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)