Table of Contents
Fetching ...

FPGA-Accelerated SpeckleNN with SNL for Real-time X-ray Single-Particle Imaging

Abhilasha Dave, Cong Wang, James Russell, Ryan Herbst, Jana Thayer

TL;DR

This work demonstrates a practical FPGA-accelerated implementation of SpeckleNN for real-time speckle pattern classification in X-ray SPI at XFEL facilities. By aggressively pruning the model from 5.6 million to 64.6K parameters and shrinking the latent space to 50 dimensions, the approach preserves 90% accuracy while enabling near-edge deployment on a KCU1500 FPGA. The deployment leverages SLAC's SNL with dynamic weight loading to avoid FPGA re-synthesis, achieving ~45 µs latency and ~9.4 W power per inference, and yielding an 8.9x speedup and 7.8x power savings over an NVIDIA A100 GPU. Layer-wise SNL-Csim vs PyTorch comparisons show strong cross-framework consistency, supporting reliable hardware deployment with minimal numerical drift. Overall, the method enables real-time adaptive speckle classification and vetoing, accelerating SPI experiments and enhancing adaptability to evolving experimental conditions.

Abstract

We implement a specialized version of our SpeckleNN model for real-time speckle pattern classification in X-ray Single-Particle Imaging (SPI) using the SLAC Neural Network Library (SNL) on an FPGA. This hardware is optimized for inference near detectors in high-throughput X-ray free-electron laser (XFEL) facilities like the Linac Coherent Light Source (LCLS). To fit FPGA constraints, we optimized SpeckleNN, reducing parameters from 5.6M to 64.6K (98.8% reduction) with 90% accuracy. We also compressed the latent space from 128 to 50 dimensions. Deployed on a KCU1500 FPGA, the model used 71% of DSPs, 75% of LUTs, and 48% of FFs, with an average power consumption of 9.4W. The FPGA achieved 45.015us inference latency at 200 MHz. On an NVIDIA A100 GPU, the same inference consumed ~73W and had a 400us latency. Our FPGA version achieved an 8.9x speedup and 7.8x power reduction over the GPU. Key advancements include model specialization and dynamic weight loading through SNL, eliminating time-consuming FPGA re-synthesis for fast, continuous deployment of (re)trained models. These innovations enable real-time adaptive classification and efficient speckle pattern vetoing, making SpeckleNN ideal for XFEL facilities. This implementation accelerates SPI experiments and enhances adaptability to evolving conditions.

FPGA-Accelerated SpeckleNN with SNL for Real-time X-ray Single-Particle Imaging

TL;DR

This work demonstrates a practical FPGA-accelerated implementation of SpeckleNN for real-time speckle pattern classification in X-ray SPI at XFEL facilities. By aggressively pruning the model from 5.6 million to 64.6K parameters and shrinking the latent space to 50 dimensions, the approach preserves 90% accuracy while enabling near-edge deployment on a KCU1500 FPGA. The deployment leverages SLAC's SNL with dynamic weight loading to avoid FPGA re-synthesis, achieving ~45 µs latency and ~9.4 W power per inference, and yielding an 8.9x speedup and 7.8x power savings over an NVIDIA A100 GPU. Layer-wise SNL-Csim vs PyTorch comparisons show strong cross-framework consistency, supporting reliable hardware deployment with minimal numerical drift. Overall, the method enables real-time adaptive speckle classification and vetoing, accelerating SPI experiments and enhancing adaptability to evolving experimental conditions.

Abstract

We implement a specialized version of our SpeckleNN model for real-time speckle pattern classification in X-ray Single-Particle Imaging (SPI) using the SLAC Neural Network Library (SNL) on an FPGA. This hardware is optimized for inference near detectors in high-throughput X-ray free-electron laser (XFEL) facilities like the Linac Coherent Light Source (LCLS). To fit FPGA constraints, we optimized SpeckleNN, reducing parameters from 5.6M to 64.6K (98.8% reduction) with 90% accuracy. We also compressed the latent space from 128 to 50 dimensions. Deployed on a KCU1500 FPGA, the model used 71% of DSPs, 75% of LUTs, and 48% of FFs, with an average power consumption of 9.4W. The FPGA achieved 45.015us inference latency at 200 MHz. On an NVIDIA A100 GPU, the same inference consumed ~73W and had a 400us latency. Our FPGA version achieved an 8.9x speedup and 7.8x power reduction over the GPU. Key advancements include model specialization and dynamic weight loading through SNL, eliminating time-consuming FPGA re-synthesis for fast, continuous deployment of (re)trained models. These innovations enable real-time adaptive classification and efficient speckle pattern vetoing, making SpeckleNN ideal for XFEL facilities. This implementation accelerates SPI experiments and enhances adaptability to evolving conditions.

Paper Structure

This paper contains 10 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: SpeckleNN model (a) original model architecture (b) optimized model architecture
  • Figure 2: SNL High Level Design Flow
  • Figure 3: Convolutional Layer 0 outcome of all 7 output featuremap of SNL Csim, PyTotch, and difference between SNL Csim and PyTorch
  • Figure 4: Convolutional Layer 0 outcome after passing through ReLU activation for all 7 output featuremap of SNL Csim, PyTotch, and difference between SNL Csim and PyTorch
  • Figure 5: Dense layer 4 output SNL Csim and PyTorch
  • ...and 3 more figures