Table of Contents
Fetching ...

Which Concepts to Forget and How to Refuse? Decomposing Concepts for Continual Unlearning in Large Vision-Language Models

Hyundong Jin, Dongyoon Han, Eunwoo Kim

Abstract

Continual unlearning poses the challenge of enabling large vision-language models to selectively refuse specific image-instruction pairs in response to sequential deletion requests, while preserving general utility. However, sequential unlearning updates distort shared representations, creating spurious associations between vision-language pairs and refusal behaviors that hinder precise identification of refusal targets, resulting in inappropriate refusals. To address this challenge, we propose a novel continual unlearning framework that grounds refusal behavior in fine-grained descriptions of visual and textual concepts decomposed from deletion targets. We first identify which visual-linguistic concept combinations characterize each forget category through a concept modulator, then determine how to generate appropriate refusal responses via a mixture of refusal experts, termed refusers, each specialized for concept-aligned refusal generation. To generate concept-specific refusal responses across sequential tasks, we introduce a multimodal, concept-driven routing scheme that reuses refusers for tasks sharing similar concepts and adapts underutilized ones for novel concepts. Extensive experiments on vision-language benchmarks demonstrate that the proposed framework outperforms existing methods by generating concept-grounded refusal responses and preserving the general utility across unlearning sequences.

Which Concepts to Forget and How to Refuse? Decomposing Concepts for Continual Unlearning in Large Vision-Language Models

Abstract

Continual unlearning poses the challenge of enabling large vision-language models to selectively refuse specific image-instruction pairs in response to sequential deletion requests, while preserving general utility. However, sequential unlearning updates distort shared representations, creating spurious associations between vision-language pairs and refusal behaviors that hinder precise identification of refusal targets, resulting in inappropriate refusals. To address this challenge, we propose a novel continual unlearning framework that grounds refusal behavior in fine-grained descriptions of visual and textual concepts decomposed from deletion targets. We first identify which visual-linguistic concept combinations characterize each forget category through a concept modulator, then determine how to generate appropriate refusal responses via a mixture of refusal experts, termed refusers, each specialized for concept-aligned refusal generation. To generate concept-specific refusal responses across sequential tasks, we introduce a multimodal, concept-driven routing scheme that reuses refusers for tasks sharing similar concepts and adapts underutilized ones for novel concepts. Extensive experiments on vision-language benchmarks demonstrate that the proposed framework outperforms existing methods by generating concept-grounded refusal responses and preserving the general utility across unlearning sequences.
Paper Structure (17 sections, 7 equations, 9 figures, 9 tables)

This paper contains 17 sections, 7 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Challenges in continual unlearning of large vision-language models emerge as sequential unlearning updates distort entangled visual-language representations, making it difficult to preserve contextually appropriate refusal behavior across tasks. (a) Irrelevant refusal: Learning new forget tasks overwrites prior refusal patterns, generating contextually misaligned refusals. (b) Over-refusal: The model inappropriately refuses retain queries.
  • Figure 2: An illustration of the proposed continual unlearning framework. For each vision-language pair to forget in the $t$-th task, (i) the concept modules produce activations for visual attributes and textual intents accumulated across tasks, and the concept modulator reweights them to emphasize relevant concepts by suppressing irrelevant ones. (ii) Given these concept activations, we compute their similarity with concepts from previous tasks to measure conceptual relevance. Based on this relevance, we leverage refusers associated with conceptually similar previous tasks or activate new ones for guiding the language model to generate concept-aware refusal responses.
  • Figure 3: Performance across sequential unlearning steps. We report average performance on the LVLM benchmarks and retain data (top) and the forget data (bottom) after each unlearning step.
  • Figure 4: Trade-off between AR on the retain data and CRR on the forget data using Vicuna (left) and Llama-2 (right).
  • Figure 5: Response changes during sequential unlearning for randomly selected forget (top) and retain (middle and bottom) samples. Comparison methods often produce semantically misaligned refusals for forget queries or mistakenly reject retain queries (red boxes), whereas ours consistently yields appropriate refusals for forget samples and suitable responses for retain samples.
  • ...and 4 more figures