FPGA Deployment of LFADS for Real-time Neuroscience Experiments
Xiaohan Liu, ChiJui Chen, YanLun Huang, LingChi Yang, Elham E Khoda, Yihui Chen, Scott Hauck, Shih-Chieh Hsu, Bo-Cheng Lai
TL;DR
This work addresses the challenge of real-time inference for LFADS on hardware by deploying an FPGA-accelerated LFADS implementation within the hls4ml framework. The authors compare post-training quantization (PTQ) and quantization-aware training (QAT) in both Keras and QKeras contexts, achieving sub-50 microsecond latency ($41.97\,bcs$) on a Xilinx Alveo U55C with a 16-bit fixed-point representation and demonstrating that 10-bit QAT maintains near-floating-point performance. Key contributions include a practical HLS/Keras pathway for Bidirectional GRU deployment, a QKeras-based quantized variant, and an IO-optimized HLS implementation that enables real-time LFADS processing on FPGA hardware. The results indicate that lossy quantization can substantially reduce resource usage while preserving accuracy, enabling large-scale real-time neuroscience experiments and paving the way for closed-loop brain-machine interfaces. The work also outlines an automated workflow and discusses scalability considerations for future VAE-based LFADS deployments on FPGAs.
Abstract
Large-scale recordings of neural activity are providing new opportunities to study neural population dynamics. A powerful method for analyzing such high-dimensional measurements is to deploy an algorithm to learn the low-dimensional latent dynamics. LFADS (Latent Factor Analysis via Dynamical Systems) is a deep learning method for inferring latent dynamics from high-dimensional neural spiking data recorded simultaneously in single trials. This method has shown a remarkable performance in modeling complex brain signals with an average inference latency in milliseconds. As our capacity of simultaneously recording many neurons is increasing exponentially, it is becoming crucial to build capacity for deploying low-latency inference of the computing algorithms. To improve the real-time processing ability of LFADS, we introduce an efficient implementation of the LFADS models onto Field Programmable Gate Arrays (FPGA). Our implementation shows an inference latency of 41.97 $μ$s for processing the data in a single trial on a Xilinx U55C.
