FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning

Khurram Khalil; Khaza Anuarul Hoque

FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning

Khurram Khalil, Khaza Anuarul Hoque

TL;DR

FlipLLM reframes bit-flip attack discovery on large language and vision-language models as a sequential decision problem. It combines sensitivity-guided pruning with Q-learning to efficiently identify minimal, high-impact bit sets, demonstrating 2.5× faster discovery and successful degradation across diverse architectures. The framework reveals consistent vulnerability in attention projections and normalization layers and shows that hardware protections like ECC SECDED can mitigate such attacks, underscoring the need for hardware-aware defenses. Overall, FlipLLM provides a scalable, architecture-agnostic toolkit for rigorous hardware-security evaluation of modern foundation models.

Abstract

Generative Artificial Intelligence models, such as Large Language Models (LLMs) and Large Vision Models (VLMs), exhibit state-of-the-art performance but remain vulnerable to hardware-based threats, specifically bit-flip attacks (BFAs). Existing BFA discovery methods lack generalizability and struggle to scale, often failing to analyze the vast parameter space and complex interdependencies of modern foundation models in a reasonable time. This paper proposes FlipLLM, a reinforcement learning (RL) architecture-agnostic framework that formulates BFA discovery as a sequential decision-making problem. FlipLLM combines sensitivity-guided layer pruning with Q-learning to efficiently identify minimal, high-impact bit sets that can induce catastrophic failure. We demonstrate the effectiveness and generalizability of FlipLLM by applying it to a diverse set of models, including prominent text-only LLMs (GPT-2 Large, LLaMA 3.1 8B, and DeepSeek-V2 7B), VLMs such as LLaVA 1.6, and datasets, such as MMLU, MMLU-Pro, VQAv2, and TextVQA. Our results show that FlipLLM can identify critical bits that are vulnerable to BFAs up to 2.5x faster than SOTA methods. We demonstrate that flipping the FlipLLM-identified bits plummets the accuracy of LLaMA 3.1 8B from 69.9% to ~0.2%, and for LLaVA's VQA score from 78% to almost 0%, by flipping as few as 5 and 7 bits, respectively. Further analysis reveals that applying standard hardware protection mechanisms, such as ECC SECDED, to the FlipLLM-identified bit locations completely mitigates the BFA impact, demonstrating the practical value of our framework in guiding hardware-level defenses. FlipLLM offers the first scalable and adaptive methodology for exploring the BFA vulnerability of both language and multimodal foundation models, paving the way for comprehensive hardware-security evaluation.

FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning

TL;DR

Abstract

FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)