Low-Latency FPGA Control System for Real-Time Neural Network Processing in CCD-Based Trapped-Ion Qubit Measurement
Binglei Lou, Gautham Duddi Krishnaswaroop, Filip Wojcicki, Ruilin Wu, Richard Rademacher, Zhiqiang Que, Wayne Luk, Philip H. W. Leong
TL;DR
This work tackles real-time qubit state detection in trapped-ion quantum processors by benchmarking DNN-based detectors on FPGA and GPU. It introduces LUT-based MLP and Vision Transformer accelerators implemented on FPGA, achieving nanosecond to microsecond inference and substantial fidelity gains, especially for multi-qubit states, while revealing Cameralink readout as the main latency bottleneck. The study demonstrates that FPGA-based detection can be over 100x faster than GPU baselines for single-shot measurements and provides actionable insights into hardware bottlenecks and optimization paths. Collectively, the results offer a practical roadmap for ultra-low-latency qubit readout and guide future camera-interface and FPGA design choices for scalable quantum measurement systems.
Abstract
Accurate and low-latency qubit state measurement is critical for trapped-ion quantum computing. While deep neural networks (DNNs) have been integrated to enhance detection fidelity, their latency performance on specific hardware platforms remains underexplored. This work benchmarks the latency of DNN-based qubit detection on field-programmable gate arrays (FPGAs) and graphics processing units (GPUs). The FPGA solution directly interfaces an electron-multiplying charge-coupled device (EMCCD) with the subsequent data processing logic, eliminating buffering and interface overheads. As a baseline, the GPU-based system employs a high-speed PCIe image grabber for image input and I/O card for state output. We deploy Multilayer Perceptron (MLP) and Vision Transformer (ViT) models on hardware to evaluate measurement performance. Compared to conventional thresholding, DNNs reduce the mean measurement fidelity (MMF) error by factors of 1.8-2.5x (one-qubit case) and 4.2-7.6x (three-qubit case). FPGA-based MLP and ViT achieve nanosecond- and microsecond-scale inference latencies, while the complete single-shot measurement process achieves over 100x speedup compared to the GPU implementation. Additionally, clock-cycle-level signal analysis reveals inefficiencies in EMCCD data transmission via Cameralink, suggesting that optimizing this interface could further leverage the advantages of ultra-low-latency DNN inference, guiding the development of next-generation qubit detection systems.
