Table of Contents
Fetching ...

A 28nm Multiply-Accumulate ASIC Architecture for On-Chip Data Compression in MHz Frame Rate X-ray and Electron Pixel Detectors

Rami Rasheedi, Nicholas Contini, Mohamed Adel Gharib, Sebastian Strempfer, Senthil Gnanasekaran, Salma Abdelzaher, Tejas Guruswamy, Kazutomo Yoshii, Mike Hammer, Henry Shi, Yu-Sheng Chen, Lorenzo Rota, Dionisio Doering, Angelo Dragone, Tao Zhou, Antonino Miceli

TL;DR

This work tackles the bandwidth bottleneck in high-throughput X-ray/electron detectors by designing an on-chip, fixed-length, lossy compression ASIC implemented in 28 nm CMOS. It uses PCA/SVD-based encoding with pre-generated weights stored in SRAM and performs online matrix multiplication to produce a fixed-length output vector of length $K$, enabling real-time compression at frame rates in the MHz regime for a $192 × 168$ pixel array at 12-bit depth. The paper details a modular hardware architecture with blocks for addressing, weight storage, FP multiply-accumulate, and accumulation, and it explores design optimizations such as pipelining, logic sharing, and data-width tuning (e.g., FP12 weights with FP16 multipliers and FP17 accumulators) to meet timing, area, and power constraints; synthesis and physical implementation results demonstrate feasibility, with SRAM-dominated area and sub-1 W/mm$^2$ power densities reported in optimized designs. The findings indicate that online, fixed-length compression is viable for scaling to multi-ASIC systems and high-brightness sources, enabling on-chip data reduction and opening paths toward broader applications like azimuthal integration and neural-network inference at the detector.”

Abstract

Modern X-ray detector systems urgently require compact, efficient, and fast data compression schemes to handle the transmission of big data from pixel arrays, enabling frame rates in the MHz regime. In this work, a data compression ASIC that implements a streaming fixed-length lossy compression scheme is introduced and analyzed, proving the feasibility and benefits of on-chip compression. The compression scheme utilizes a vector matrix product logic, which performs a number of floating-point multiplications, additions, and accumulations. The logic is verified, synthesized, and shown to fit in the area resource available for the X-ray detector under study, which comprises 192 x 168 pixels each of 12-bit width, and having a total area of 20 mm x 20 mm, about 2 mm x 20 mm of which are available for the digital logic. Several system architectures, precisions, and compression ratios ranging from 100 to 250 were analyzed to pave the way for on-chip fixed-length compression (e.g., principal component analysis, singular value decomposition) and data reduction (e.g., azimuthal integration) for X-ray and electron detectors.

A 28nm Multiply-Accumulate ASIC Architecture for On-Chip Data Compression in MHz Frame Rate X-ray and Electron Pixel Detectors

TL;DR

This work tackles the bandwidth bottleneck in high-throughput X-ray/electron detectors by designing an on-chip, fixed-length, lossy compression ASIC implemented in 28 nm CMOS. It uses PCA/SVD-based encoding with pre-generated weights stored in SRAM and performs online matrix multiplication to produce a fixed-length output vector of length , enabling real-time compression at frame rates in the MHz regime for a pixel array at 12-bit depth. The paper details a modular hardware architecture with blocks for addressing, weight storage, FP multiply-accumulate, and accumulation, and it explores design optimizations such as pipelining, logic sharing, and data-width tuning (e.g., FP12 weights with FP16 multipliers and FP17 accumulators) to meet timing, area, and power constraints; synthesis and physical implementation results demonstrate feasibility, with SRAM-dominated area and sub-1 W/mm power densities reported in optimized designs. The findings indicate that online, fixed-length compression is viable for scaling to multi-ASIC systems and high-brightness sources, enabling on-chip data reduction and opening paths toward broader applications like azimuthal integration and neural-network inference at the detector.”

Abstract

Modern X-ray detector systems urgently require compact, efficient, and fast data compression schemes to handle the transmission of big data from pixel arrays, enabling frame rates in the MHz regime. In this work, a data compression ASIC that implements a streaming fixed-length lossy compression scheme is introduced and analyzed, proving the feasibility and benefits of on-chip compression. The compression scheme utilizes a vector matrix product logic, which performs a number of floating-point multiplications, additions, and accumulations. The logic is verified, synthesized, and shown to fit in the area resource available for the X-ray detector under study, which comprises 192 x 168 pixels each of 12-bit width, and having a total area of 20 mm x 20 mm, about 2 mm x 20 mm of which are available for the digital logic. Several system architectures, precisions, and compression ratios ranging from 100 to 250 were analyzed to pave the way for on-chip fixed-length compression (e.g., principal component analysis, singular value decomposition) and data reduction (e.g., azimuthal integration) for X-ray and electron detectors.

Paper Structure

This paper contains 26 sections, 2 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: (a) Variable-length compression scheme. (b) Fixed-length compression scheme.
  • Figure 2: Online matrix multiplication-based compression, where the pixel matrix is flattened out and multiplied by the pre-generated encoding matrix, which is stored on-chip in SRAM. N and M are the number of columns and rows in the pixel array, respectively. The detector ASIC output (not to scale) is compressed to a vector of length $K$ -- the number of eigenvectors (principal components) which are kept.
  • Figure 3: Block diagram of the multiply-accumulate ASIC architecture.
  • Figure 4: Floating-point multiplication level.
  • Figure 5: Accumulation level.
  • ...and 5 more figures