MLLMEraser: Achieving Test-Time Unlearning in Multimodal Large Language Models through Activation Steering
Chenlu Ding, Jiancan Wu, Leheng Sheng, Fan Zhang, Yancheng Yuan, Xiang Wang, Xiangnan He
TL;DR
MLLMEraser tackles the need for trustworthy multimodal LLM deployment by enabling test-time unlearning without parameter updates. It constructs a multimodal erasure direction from contrastive knowledge-recall and knowledge-erasure signals and applies it through an input-aware steering mechanism that uses a null-space projection to prevent degradation on retained content. The method achieves strong forgetting performance with minimal utility loss and substantially lower computational cost compared with training-based approaches, as demonstrated on LLaVA-1.5-7B and Qwen-2.5-VL-7B. This work offers a practical, reversible solution for content forgetting in MLLMs and opens avenues for extending activation-steering unlearning to broader multimodal scenarios.
Abstract
Multimodal large language models (MLLMs) have demonstrated remarkable capabilities across vision-language tasks, yet their large-scale deployment raises pressing concerns about memorized private data, outdated knowledge, and harmful content. Existing unlearning approaches for MLLMs typically adapt training-based strategies such as gradient ascent or preference optimization, but these methods are computationally expensive, irreversible, and often distort retained knowledge. In this work, we propose MLLMEraser, an input-aware, training-free framework for test-time unlearning. Our approach leverages activation steering to enable dynamic knowledge erasure without parameter updates. Specifically, we construct a multimodal erasure direction by contrasting adversarially perturbed, knowledge-recall image-text pairs with knowledge-erasure counterparts, capturing both textual and visual discrepancies. To prevent unnecessary interference, we further design an input-aware steering mechanism that adaptively determines when and how the erasure direction should be applied, preserving utility on retained knowledge while enforcing forgetting on designated content. Experiments on LLaVA-1.5 and Qwen-2.5-VL demonstrate that MLLMEraser consistently outperforms state-of-the-art MLLM unlearning baselines, achieving stronger forgetting performance with lower computational cost and minimal utility degradation.
