MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Language Models
Jiahao Huo, Yibo Yan, Xu Zheng, Yuanhuiyi Lyu, Xin Zou, Zhihua Wei, Xuming Hu
TL;DR
This work addresses the challenge of forgetting targeted visual knowledge in Multimodal Large Language Models (MLLMs) without erasing textual knowledge. It reformulates multimodal MU for MLLMs and introduces MMUnlearner, a geometry-constrained gradient ascent method guided by a weight-saliency map to selectively update parameters. Experiments on two MLLMs and two benchmarks show MMUnlearner outperforms LLM-based MU baselines in erasing visual concepts while preserving textual knowledge, highlighting its effectiveness and efficiency. The approach advances safe and privacy-conscious deployment of multimodal AI by enabling targeted forgetting without sacrificing core multimodal reasoning capabilities.
Abstract
Recent progress in Machine Unlearning (MU) has introduced solutions for the selective removal of private or sensitive information encoded within deep neural networks. Nonetheless, MU for Multimodal Large Language Models (MLLMs) remains in its nascent phase. Therefore, we propose to reformulate the task of multimodal MU in the era of MLLMs, which aims to erase only the visual patterns associated with a given entity while preserving the corresponding textual knowledge encoded within the original parameters of the language model backbone. Furthermore, we develop a novel geometry-constrained gradient ascent method MMUnlearner. It updates the weights of MLLMs with a weight saliency map jointly restricted by the remaining concepts and textual knowledge during unlearning, thereby preserving parameters essential for non-target knowledge. Extensive experiments demonstrate that MMUnlearner surpasses baselines that finetuning MLLMs with VQA data directly through Gradient Ascent (GA) or Negative Preference Optimization (NPO), across all evaluation dimensions. Our code can be found in [this URL](https://github.com/Z1zs/MMUnlearner).
