Table of Contents
Fetching ...

AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models

Haokun Chen, Jianing Li, Yao Zhang, Jinhe Bi, Yan Xia, Jindong Gu, Volker Tresp

TL;DR

This work tackles the challenge of precisely forgetting a targeted visual concept in Multimodal Large Language Models without erasing related concepts. It introduces AUVIC, a framework that uses adversarial perturbations in both text and image modalities and a dynamic anchor-preservation mechanism to isolate forgetting to the target concept. A novel benchmark, VCUBench, is proposed to evaluate fine-grained forgetting in single and group contexts, including collateral forgetting and generalization. Empirical results show AUVIC achieves state-of-the-art forgetting rates with minimal degradation to non-target recognition and language fluency, supporting practical applicability in privacy and regulatory compliance for multimodal models.

Abstract

Multimodal Large Language Models (MLLMs) achieve impressive performance once optimized on massive datasets. Such datasets often contain sensitive or copyrighted content, raising significant data privacy concerns. Regulatory frameworks mandating the 'right to be forgotten' drive the need for machine unlearning. This technique allows for the removal of target data without resource-consuming retraining. However, while well-studied for text, visual concept unlearning in MLLMs remains underexplored. A primary challenge is precisely removing a target visual concept without disrupting model performance on related entities. To address this, we introduce AUVIC, a novel visual concept unlearning framework for MLLMs. AUVIC applies adversarial perturbations to enable precise forgetting. This approach effectively isolates the target concept while avoiding unintended effects on similar entities. To evaluate our method, we construct VCUBench. It is the first benchmark designed to assess visual concept unlearning in group contexts. Experimental results demonstrate that AUVIC achieves state-of-the-art target forgetting rates while incurs minimal performance degradation on non-target concepts.

AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models

TL;DR

This work tackles the challenge of precisely forgetting a targeted visual concept in Multimodal Large Language Models without erasing related concepts. It introduces AUVIC, a framework that uses adversarial perturbations in both text and image modalities and a dynamic anchor-preservation mechanism to isolate forgetting to the target concept. A novel benchmark, VCUBench, is proposed to evaluate fine-grained forgetting in single and group contexts, including collateral forgetting and generalization. Empirical results show AUVIC achieves state-of-the-art forgetting rates with minimal degradation to non-target recognition and language fluency, supporting practical applicability in privacy and regulatory compliance for multimodal models.

Abstract

Multimodal Large Language Models (MLLMs) achieve impressive performance once optimized on massive datasets. Such datasets often contain sensitive or copyrighted content, raising significant data privacy concerns. Regulatory frameworks mandating the 'right to be forgotten' drive the need for machine unlearning. This technique allows for the removal of target data without resource-consuming retraining. However, while well-studied for text, visual concept unlearning in MLLMs remains underexplored. A primary challenge is precisely removing a target visual concept without disrupting model performance on related entities. To address this, we introduce AUVIC, a novel visual concept unlearning framework for MLLMs. AUVIC applies adversarial perturbations to enable precise forgetting. This approach effectively isolates the target concept while avoiding unintended effects on similar entities. To evaluate our method, we construct VCUBench. It is the first benchmark designed to assess visual concept unlearning in group contexts. Experimental results demonstrate that AUVIC achieves state-of-the-art target forgetting rates while incurs minimal performance degradation on non-target concepts.

Paper Structure

This paper contains 23 sections, 9 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison of unlearning behaviors in MLLMs. (a) Existing methods fail to forget the target identity precisely in multi-person scenarios and often erase similar concepts. (b) Our method forgets the target while preserving non-target individuals, even in group settings.
  • Figure 2: Collateral forgetting matrix across 8 concepts. Each row denotes an unlearned model for a specific target; each column shows unintended forgetting of other concepts.
  • Figure 3: Recall and BLEU score of each non-target concept before and after unlearning "Donald Trump"
  • Figure 4: Overview of our adversarial unlearning framework. The architecture consists of a generator (top) and a discriminator (bottom) trained in an alternating loop. Given input images, the generator (top) uses a frozen CLIP encoder to extract visual features and generates perturbation through a trainable MLP. The discriminator (bottom) receives the adversarial image and a perturbed prompt, and is optimized to suppress target predictions while preserving non-target concepts. The vision tower is partially trainable, while the rest of the model remains frozen. Generator and discriminator are updated alternately in an adversarial loop. Our benchmark VCUBench, shown on the right, contains both single-person and multi-person images to evaluate forgetting and retention.
  • Figure 5: Example evaluation responses of both GA and proposed method in our benchmark. Unlearning Target: Donald Trump.