Table of Contents
Fetching ...

Approximate Domain Unlearning for Vision-Language Models

Kodai Kawamura, Yuta Goto, Rintaro Yanagi, Hirokatsu Kataoka, Go Irie

TL;DR

Approximate Domain Unlearning is introduced, a novel problem setting that requires reducing recognition accuracy for images from specified domains while preserving accuracy for other domains (e.g., real) while preserving accuracy for others (e.g., real).

Abstract

Pre-trained Vision-Language Models (VLMs) exhibit strong generalization capabilities, enabling them to recognize a wide range of objects across diverse domains without additional training. However, they often retain irrelevant information beyond the requirements of specific downstream tasks, raising concerns about computational efficiency and potential information leakage. This has motivated growing interest in approximate unlearning, which aims to selectively remove unnecessary knowledge while preserving overall model performance. Existing approaches to approximate unlearning have primarily focused on class unlearning, where a VLM is retrained to fail to recognize specified object classes while maintaining accuracy for others. However, merely forgetting object classes is often insufficient in practical applications. For instance, an autonomous driving system should accurately recognize real cars while avoiding misrecognition of illustrated cars depicted in roadside advertisements as real cars, which could be hazardous. In this paper, we introduce Approximate Domain Unlearning (ADU), a novel problem setting that requires reducing recognition accuracy for images from specified domains (e.g., illustration) while preserving accuracy for other domains (e.g., real). ADU presents new technical challenges: due to the strong domain generalization capability of pre-trained VLMs, domain distributions are highly entangled in the feature space, making naive approaches based on penalizing target domains ineffective. To tackle this limitation, we propose a novel approach that explicitly disentangles domain distributions and adaptively captures instance-specific domain information. Extensive experiments show that our approach outperforms baselines built upon VLM tuning techniques, paving the way for practical and fine-grained unlearning in VLMs. Code: https://kodaikawamura.github.io/Domain_Unlearning/.

Approximate Domain Unlearning for Vision-Language Models

TL;DR

Approximate Domain Unlearning is introduced, a novel problem setting that requires reducing recognition accuracy for images from specified domains while preserving accuracy for other domains (e.g., real) while preserving accuracy for others (e.g., real).

Abstract

Pre-trained Vision-Language Models (VLMs) exhibit strong generalization capabilities, enabling them to recognize a wide range of objects across diverse domains without additional training. However, they often retain irrelevant information beyond the requirements of specific downstream tasks, raising concerns about computational efficiency and potential information leakage. This has motivated growing interest in approximate unlearning, which aims to selectively remove unnecessary knowledge while preserving overall model performance. Existing approaches to approximate unlearning have primarily focused on class unlearning, where a VLM is retrained to fail to recognize specified object classes while maintaining accuracy for others. However, merely forgetting object classes is often insufficient in practical applications. For instance, an autonomous driving system should accurately recognize real cars while avoiding misrecognition of illustrated cars depicted in roadside advertisements as real cars, which could be hazardous. In this paper, we introduce Approximate Domain Unlearning (ADU), a novel problem setting that requires reducing recognition accuracy for images from specified domains (e.g., illustration) while preserving accuracy for other domains (e.g., real). ADU presents new technical challenges: due to the strong domain generalization capability of pre-trained VLMs, domain distributions are highly entangled in the feature space, making naive approaches based on penalizing target domains ineffective. To tackle this limitation, we propose a novel approach that explicitly disentangles domain distributions and adaptively captures instance-specific domain information. Extensive experiments show that our approach outperforms baselines built upon VLM tuning techniques, paving the way for practical and fine-grained unlearning in VLMs. Code: https://kodaikawamura.github.io/Domain_Unlearning/.

Paper Structure

This paper contains 32 sections, 6 equations, 9 figures, 17 tables.

Figures (9)

  • Figure 1: Illustration of Approximate Domain Unlearning (ADU). ADU is a novel approximate unlearning problem introduced in this paper. Unlike existing approximate class unlearning tasks, ADU requires retraining a pre-trained Vision-Language Model (VLM) so that it cannot recognize images from specified domains ( painting, clipart, sketch in the figure) while preserving its ability to recognize images from other domains ( real in the figure).
  • Figure 2: Overview of Proposed Method. (a) The common approach to approximate unlearning is to minimize the cross-entropy to the ground truth class labels for the domains to be memorized and to maximize the entropy for the domains to be forgotten. This approach alone is not satisfactory, due to the strong generalization performance of pre-trained VLMs. We therefore introduce two techniques to facilitate ADU; Domain Disentangling Loss (DDL) to disentangle the domain distrubtions in the latent feature space and Instance-wise Prompt Generator (InstaPG) to capture image-level differences of domains. (b) InstaPG utilizes an attention mechanism where the vision prompt acts as the query and the image patch features serve as the key and value. Through this mechanism, instance-wise prompts are dynamically generated, allowing the model to adaptively refine prompts based on individual image characteristics.
  • Figure 3: Impact of Loss Weights $\gamma$ and $\lambda$ in Domain Disentangling Loss (DDL). We analyze the effect of varying the loss weights $\gamma$ and $\lambda$, which control the weights of the cross-entropy loss and the Maximum Mean Discrepancy (MMD) loss, respectively. Performance remains stable across a wide range of values once both $\gamma$ and $\lambda$ exceed a certain threshold, indicating that the proposed method is not highly sensitive to the choice of these hyperparameters.
  • Figure 4: Sensitivity to The Number of Training Samples per Domain. We compare our method with Baseline, which uses $\mathcal{L}_{\mathrm{memorize}}$ and $\mathcal{L}_{\mathrm{forget}}$ for vision prompt learning. While Baseline shows limited improvement with more shots, especially on Mini DomainNet, our method consistently improves, demonstrating better generalization and reduced overfitting.
  • Figure 5: t-SNE Visualization, Where the Domain to Be Forgotten is Real. (a) In the feature space of zero-shot CLIP, features from different domains are entangled, indicating poor domain separation, which makes domain-wise control over feature representations difficult. (b) By applying our method, the features are effectively disentangled from their domains, facilitating domain-wise control.
  • ...and 4 more figures