Rotated Robustness: A Training-Free Defense against Bit-Flip Attacks on Large Language Models

Deng Liu; Song Chen

Rotated Robustness: A Training-Free Defense against Bit-Flip Attacks on Large Language Models

Deng Liu, Song Chen

Abstract

Hardware faults, specifically bit-flips in quantized weights, pose a severe reliability threat to Large Language Models (LLMs), often triggering catastrophic model collapses. We demonstrate that this vulnerability fundamentally stems from the spatial alignment between sensitive weight bits and extreme activation outliers, which causes a single hardware fault to be massively amplified. To address this, we propose Rotated Robustness (RoR), a training-free defense utilizing orthogonal Householder transformations. By applying an orthogonal rotation to the activation space, RoR geometrically smooths extreme outliers across all feature dimensions. This mechanism effectively breaks the alignment between outliers and vulnerable weights, mathematically guaranteeing original model accuracy. Extensive empirical evaluations across Llama-2/3, OPT, and Qwen families demonstrate the superior reliability of our approach. Under random bit-flip attacks, RoR reduces the stochastic collapse rate from 3.15\% to 0.00\% on Qwen2.5-7B. Furthermore, under severe targeted attacks with 50 Progressive Bit Search flips, RoR sustains robust reasoning on Llama-2-7B, maintaining a 43.9\% MMLU accuracy that nearly matches its 45.2\% unattacked accuracy, while competing defenses collapse to random guessing. Most notably, against the Single-Point Fault Attack (SPFA) -- the most aggressive targeted threat -- RoR exponentially inflates the attack complexity from a few bits to over 17,000 precise bit-flips. With a negligible storage overhead of 0.31\% and a minimal inference latency increase of 9.1\% on Llama-2-7B, RoR achieves true lossless robustness, providing a practical and highly reliable defense for LLM deployment.

Rotated Robustness: A Training-Free Defense against Bit-Flip Attacks on Large Language Models

Abstract

Paper Structure (27 sections, 15 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 27 sections, 15 equations, 8 figures, 5 tables, 1 algorithm.

Introduction
Background and Threat Model
LLM Inference and Quantization
Threat Model: Hardware Faults and Adversary Capabilities
Defense Objectives
Motivation
The Low-Probability SPoF Phenomenon
Spatial Alignment of SPoFs with Outlier Features
Methodology
Vulnerability Analysis: Bounding the Worst-Case Error
RoR Framework: Lossless Outlier Smoothing
Efficient Implementation via Compact WY Representation
Evaluation
Experimental Setup
Black-Box Robustness under Random Bit-Flip Attacks (RQ1)
...and 12 more sections

Figures (8)

Figure 1: Bit-flip error propagation in Transformers. A targeted physical attack (e.g., Rowhammer) corrupts a single bit in a weight matrix (e.g., $W_Q$) stored in DRAM. This localized error multiplicatively amplifies through subsequent Multi-Head Self-Attention (MHSA) and feed-forward layers, ultimately causing catastrophic output corruption.
Figure 2: The SPoF Phenomenon. PPL fluctuations of OPT-125M under random bit-flips ($100$ seeds). Most trials survive (blue dots), while specific seeds trigger catastrophic PPL explosions (red star).
Figure 3: Spatial Alignment of SPoFs. Visualization of activations in OPT-125M layer2.fc1. Channel 706 (white vertical stripe) exhibits extreme magnitudes ($\sim$6) compared to surrounding channels (dark background), creating a structural vulnerability for bit-flip in the corresponding weight row.
Figure 4: Illustration of Householder Rotation and smoothing. (a) Geometrically, $\mathbf{H}$ reflects the outlier vector (red) into a smoothed vector (blue). (b) In the matrix view, this operation smooths the spike of the outlier column across all dimensions.
Figure 5: The overall framework of RoR. The pipeline consists of four steps: Offline Outlier Identification, Compact WY Construction, Offline Weight Fusion, and Online Inference. The fusion process absorbs the majority of the computational cost offline, leaving only a negligible low-rank correction during inference.
...and 3 more figures

Rotated Robustness: A Training-Free Defense against Bit-Flip Attacks on Large Language Models

Abstract

Rotated Robustness: A Training-Free Defense against Bit-Flip Attacks on Large Language Models

Authors

Abstract

Table of Contents

Figures (8)