Table of Contents
Fetching ...

Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning

Zhen Zeng, Leijiang Gu, Zhangling Duan, Feng Li, Zenglin Shi, Cees G. M. Snoek, Meng Wang

TL;DR

This work tackles the privacy risk of memory in Multimodal LLMs by introducing benign forgetting, a targeted unlearning paradigm that removes specific sensitive knowledge while preserving core image understanding. It presents SMFA, a two-stage approach combining a Memory Forgetting Adapter learned from refusal labels and a retaining-anchor-guided masking mechanism to suppress harmful forgetting. A new benchmark, S-MLLMUn Bench, jointly evaluates forgetting efficacy and retention of general visual understanding across synthetic profiles and ophthalmic images. Experiments show SMFA achieves precise, controllable forgetting with minimal impact on retention and output quality, addressing a key gap in safe, practical unlearning for MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) achieve remarkable capabilities but can inadvertently memorize privacy-sensitive information. Although existing unlearning methods can remove such knowledge, they fail to achieve benign forgetting because they often degrade the model's general image understanding performance. To address this, we propose the Sculpted Memory Forgetting Adapter (SMFA), which confines forgetting to targeted memory regions while preserving overall capabilities. SMFA first fine-tunes the model to replace sensitive responses with refusals, yielding a memory forgetting adapter, and then applies a retaining anchor-guided masking mechanism to prevent interference with unrelated knowledge and understanding ability. To systematically evaluate selective MLLM unlearning, we introduce S-MLLMUn Bench, the first benchmark designed to jointly assess the removal of sensitive knowledge and retention of general visual understanding. Extensive experiments show that, unlike prior methods, SMFA achieves precise and controllable unlearning while maintaining the model's foundational image understanding.

Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning

TL;DR

This work tackles the privacy risk of memory in Multimodal LLMs by introducing benign forgetting, a targeted unlearning paradigm that removes specific sensitive knowledge while preserving core image understanding. It presents SMFA, a two-stage approach combining a Memory Forgetting Adapter learned from refusal labels and a retaining-anchor-guided masking mechanism to suppress harmful forgetting. A new benchmark, S-MLLMUn Bench, jointly evaluates forgetting efficacy and retention of general visual understanding across synthetic profiles and ophthalmic images. Experiments show SMFA achieves precise, controllable forgetting with minimal impact on retention and output quality, addressing a key gap in safe, practical unlearning for MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) achieve remarkable capabilities but can inadvertently memorize privacy-sensitive information. Although existing unlearning methods can remove such knowledge, they fail to achieve benign forgetting because they often degrade the model's general image understanding performance. To address this, we propose the Sculpted Memory Forgetting Adapter (SMFA), which confines forgetting to targeted memory regions while preserving overall capabilities. SMFA first fine-tunes the model to replace sensitive responses with refusals, yielding a memory forgetting adapter, and then applies a retaining anchor-guided masking mechanism to prevent interference with unrelated knowledge and understanding ability. To systematically evaluate selective MLLM unlearning, we introduce S-MLLMUn Bench, the first benchmark designed to jointly assess the removal of sensitive knowledge and retention of general visual understanding. Extensive experiments show that, unlike prior methods, SMFA achieves precise and controllable unlearning while maintaining the model's foundational image understanding.

Paper Structure

This paper contains 24 sections, 13 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: (a) The goal of MLLM unlearning is to make the model selectively forget image knowledge memory, while preserving its general visual understanding ability. (b) Forgetting rates and the corresponding image understanding abilities under different parameter settings for representative unlearning methods.
  • Figure 2: Overview of the proposed Sculpted Memory Forgetting Adapter (SMFA). First, a Memory Forgetting Adapter (MFA) is derived via refusal label-based fine-tuning on the forget set. Then, a retaining anchor-guided masking strategy sculpts the MFA by filtering harmful forgetting updates.
  • Figure 3: Overall pipeline of S-MLLMUn Bench. It includes a fine-tuning dataset, an unlearning dataset, and an evaluation dataset.
  • Figure 4: Analysis of the hyperparameter $k$ on LLaVA-OneVision with forget ratio 5% and 10%. Orig. denotes Original.
  • Figure 5: Comparison of image understanding ability across different image types under various unlearning methods on LLaVA-OneVision with forget ratio of 5%.
  • ...and 5 more figures