FPGA-Accelerated SpeckleNN with SNL for Real-time X-ray Single-Particle Imaging
Abhilasha Dave, Cong Wang, James Russell, Ryan Herbst, Jana Thayer
TL;DR
This work demonstrates a practical FPGA-accelerated implementation of SpeckleNN for real-time speckle pattern classification in X-ray SPI at XFEL facilities. By aggressively pruning the model from 5.6 million to 64.6K parameters and shrinking the latent space to 50 dimensions, the approach preserves 90% accuracy while enabling near-edge deployment on a KCU1500 FPGA. The deployment leverages SLAC's SNL with dynamic weight loading to avoid FPGA re-synthesis, achieving ~45 µs latency and ~9.4 W power per inference, and yielding an 8.9x speedup and 7.8x power savings over an NVIDIA A100 GPU. Layer-wise SNL-Csim vs PyTorch comparisons show strong cross-framework consistency, supporting reliable hardware deployment with minimal numerical drift. Overall, the method enables real-time adaptive speckle classification and vetoing, accelerating SPI experiments and enhancing adaptability to evolving experimental conditions.
Abstract
We implement a specialized version of our SpeckleNN model for real-time speckle pattern classification in X-ray Single-Particle Imaging (SPI) using the SLAC Neural Network Library (SNL) on an FPGA. This hardware is optimized for inference near detectors in high-throughput X-ray free-electron laser (XFEL) facilities like the Linac Coherent Light Source (LCLS). To fit FPGA constraints, we optimized SpeckleNN, reducing parameters from 5.6M to 64.6K (98.8% reduction) with 90% accuracy. We also compressed the latent space from 128 to 50 dimensions. Deployed on a KCU1500 FPGA, the model used 71% of DSPs, 75% of LUTs, and 48% of FFs, with an average power consumption of 9.4W. The FPGA achieved 45.015us inference latency at 200 MHz. On an NVIDIA A100 GPU, the same inference consumed ~73W and had a 400us latency. Our FPGA version achieved an 8.9x speedup and 7.8x power reduction over the GPU. Key advancements include model specialization and dynamic weight loading through SNL, eliminating time-consuming FPGA re-synthesis for fast, continuous deployment of (re)trained models. These innovations enable real-time adaptive classification and efficient speckle pattern vetoing, making SpeckleNN ideal for XFEL facilities. This implementation accelerates SPI experiments and enhances adaptability to evolving conditions.
