FaRAccel: FPGA-Accelerated Defense Architecture for Efficient Bit-Flip Attack Resilience in Transformer Models
Najmeh Nazari, Banafsheh Saber Latibari, Elahe Hosseini, Fatemeh Movafagh, Chongzhou Fang, Hosein Mohammadi Makrani, Kevin Immanuel Gubbi, Abhijit Mahalanobis, Setareh Rafatirad, Hossein Sayadi, Houman Homayoun
TL;DR
The paper tackles Bit-Flip Attacks on Transformer models by enhancing the Forget and Rewire (FaR) defense with a dedicated FPGA-based accelerator, FaRAccel. It reframes FaR as a constant-throughput, operand-selection problem, enabling a compact FaRMap and pre-scaled donor weights to steer activations without altering the model topology or retraining. The hardware design preserves the baseline GEMM datapath, achieving up to $10$–$15\times$ end-to-end speedup over software FaR while maintaining robustness against BFAs and incurring minimal overhead (often $<3\%$). This work demonstrates a practical algorithm-hardware co-design for secure, efficient Transformer inference on edge and embedded platforms.
Abstract
Forget and Rewire (FaR) methodology has demonstrated strong resilience against Bit-Flip Attacks (BFAs) on Transformer-based models by obfuscating critical parameters through dynamic rewiring of linear layers. However, the application of FaR introduces non-negligible performance and memory overheads, primarily due to the runtime modification of activation pathways and the lack of hardware-level optimization. To overcome these limitations, we propose FaRAccel, a novel hardware accelerator architecture implemented on FPGA, specifically designed to offload and optimize FaR operations. FaRAccel integrates reconfigurable logic for dynamic activation rerouting, and lightweight storage of rewiring configurations, enabling low-latency inference with minimal energy overhead. We evaluate FaRAccel across a suite of Transformer models and demonstrate substantial reductions in FaR inference latency and improvement in energy efficiency, while maintaining the robustness gains of the original FaR methodology. To the best of our knowledge, this is the first hardware-accelerated defense against BFAs in Transformers, effectively bridging the gap between algorithmic resilience and efficient deployment on real-world AI platforms.
