Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models

Cen Lu; Yung-Chen Tang; Andrea Cavallaro

Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models

Cen Lu, Yung-Chen Tang, Andrea Cavallaro

TL;DR

The paper investigates the robustness of large vision-language models to neuron-level failures and introduces Consistently Activated Neurons (CAN), a method that jointly uses activation magnitude and gradient sensitivity to identify critical neurons. Experiments show that ablating only a handful of language-model FFN neurons can cause catastrophic collapse, with LLaVA needing as few as five neurons and InstructBLIP requiring around 1200, revealing a bottleneck in the language backbone rather than in vision modules. A two-stage collapse pattern is observed: initial expressive degradation followed by sudden, complete collapse. These findings have important safety implications for LVLM deployment, suggesting targeted defenses should focus on safeguarding the language-model FFN pathways. The work also provides a data-driven framework for locating vulnerability hotspots across LVLM architectures and object categories.

Abstract

Large Vision-Language Models (LVLMs) have shown impressive multimodal understanding capabilities, yet their robustness is poorly understood. In this paper, we investigate the structural vulnerabilities of LVLMs to identify any critical neurons whose removal triggers catastrophic collapse. In this context, we propose CAN, a method to detect Consistently Activated Neurons and to locate critical neurons by progressive masking. Experiments on LLaVA-1.5-7b-hf and InstructBLIP-Vicuna-7b reveal that masking only a tiny portion of the language model's feed-forward networks (just as few as four neurons in extreme cases) suffices to trigger catastrophic collapse. Notably, critical neurons are predominantly localized in the language model rather than in the vision components, and the down-projection layer is a particularly vulnerable structure. We also observe a consistent two-stage collapse pattern: initial expressive degradation followed by sudden, complete collapse. Our findings provide important insights for safety research in LVLMs.

Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models

TL;DR

Abstract

Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)