Table of Contents
Fetching ...

Explaining Domain Shifts in Language: Concept erasing for Interpretable Image Classification

Zequn Zeng, Yudi Su, Jianqiao Sun, Tiansheng Wen, Hao Zhang, Zhengjue Wang, Bo Chen, Hongwei Liu, Jiawei Ma

TL;DR

This work tackles domain shifts inside concept-based image classifiers by showing that pre-trained vision-language models can interpret domain differences as language descriptors. It introduces LanCE, a framework that pairs CLIP-based concept bottleneck models with a Domain Descriptor Orthogonality (DDO) loss and uses LLM-generated domain descriptors to simulate unseen domains. Empirical results across seven benchmarks, including three new datasets, demonstrate that DDO significantly improves out-of-distribution generalization while maintaining in-distribution accuracy, without altering core model architectures. The approach provides a concept-level explanation of domain shifts, a robust plug-in regularizer, and opens up new benchmarks and resources for interpretable domain generalization.

Abstract

Concept-based models can map black-box representations to human-understandable concepts, which makes the decision-making process more transparent and then allows users to understand the reason behind predictions. However, domain-specific concepts often impact the final predictions, which subsequently undermine the model generalization capabilities, and prevent the model from being used in high-stake applications. In this paper, we propose a novel Language-guided Concept-Erasing (LanCE) framework. In particular, we empirically demonstrate that pre-trained vision-language models (VLMs) can approximate distinct visual domain shifts via domain descriptors while prompting large Language Models (LLMs) can easily simulate a wide range of descriptors of unseen visual domains. Then, we introduce a novel plug-in domain descriptor orthogonality (DDO) regularizer to mitigate the impact of these domain-specific concepts on the final predictions. Notably, the DDO regularizer is agnostic to the design of concept-based models and we integrate it into several prevailing models. Through evaluation of domain generalization on four standard benchmarks and three newly introduced benchmarks, we demonstrate that DDO can significantly improve the out-of-distribution (OOD) generalization over the previous state-of-the-art concept-based models.Our code is available at https://github.com/joeyz0z/LanCE.

Explaining Domain Shifts in Language: Concept erasing for Interpretable Image Classification

TL;DR

This work tackles domain shifts inside concept-based image classifiers by showing that pre-trained vision-language models can interpret domain differences as language descriptors. It introduces LanCE, a framework that pairs CLIP-based concept bottleneck models with a Domain Descriptor Orthogonality (DDO) loss and uses LLM-generated domain descriptors to simulate unseen domains. Empirical results across seven benchmarks, including three new datasets, demonstrate that DDO significantly improves out-of-distribution generalization while maintaining in-distribution accuracy, without altering core model architectures. The approach provides a concept-level explanation of domain shifts, a robust plug-in regularizer, and opens up new benchmarks and resources for interpretable domain generalization.

Abstract

Concept-based models can map black-box representations to human-understandable concepts, which makes the decision-making process more transparent and then allows users to understand the reason behind predictions. However, domain-specific concepts often impact the final predictions, which subsequently undermine the model generalization capabilities, and prevent the model from being used in high-stake applications. In this paper, we propose a novel Language-guided Concept-Erasing (LanCE) framework. In particular, we empirically demonstrate that pre-trained vision-language models (VLMs) can approximate distinct visual domain shifts via domain descriptors while prompting large Language Models (LLMs) can easily simulate a wide range of descriptors of unseen visual domains. Then, we introduce a novel plug-in domain descriptor orthogonality (DDO) regularizer to mitigate the impact of these domain-specific concepts on the final predictions. Notably, the DDO regularizer is agnostic to the design of concept-based models and we integrate it into several prevailing models. Through evaluation of domain generalization on four standard benchmarks and three newly introduced benchmarks, we demonstrate that DDO can significantly improve the out-of-distribution (OOD) generalization over the previous state-of-the-art concept-based models.Our code is available at https://github.com/joeyz0z/LanCE.

Paper Structure

This paper contains 26 sections, 12 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Domain shifts in concept space. (a) Given the images of an apple in different visual domains, the prediction confidence of a concept-based model, trained on the photo domain, degrades due to the missing of concepts. (b) Distribution comparison of concept activation value (image-concept similarity computed via CLIP clip) between photo domain and sketch domain, for two concepts, i.e., "red color" and "round shape", respectively. JS divergence indicates the distance between two distributions and tends to be larger for domain-specific concepts ($\emph{e.g}.\xspace$ "red color").
  • Figure 2: (a) We empirically demonstrate that the visual domain shift can be interpreted in language. For each class, we obtain the caption from the embedding difference of images from two domains. Then, by aggregating the captions across the classes, the keywords regarding the domain shift can be highlighted. At the same time, (b) the domain-specific concepts can be discovered from language descriptions of the different domains. In detail, they have higher similarities with the difference of domain-related class descriptions, i.e., textural domain shift, in the CLIP embedding space. More analyses are shown in the Appendix \ref{['appendix:empirical']}.
  • Figure 3: Overview of the LanCE. Blue part is the data flow of vanilla CLIP-CBMs (Sec. \ref{['Problem formulation']}). To provide concept-level explanations, we first construct a human-written or LLMs-generated concept set $\mathcal{C}$ and extract the concept embeddings via the frozen CLIP text encoder. Given an image, we can extract the image embeddings via the frozen CLIP image encoder. The concept activations are the cosine similarity between image embeddings and concept embeddings. A learnable linear layer $W_F$ is fitted on top of the concept activation vector and is responsible for predicting the final class and is optimized via cross-entropy loss. Yellow part is the data flow of our proposed DDO regularizer (Sec. \ref{['ddoloss']}). Similarly, we first construct a domain descriptor set (Sec. \ref{['Generate domain descriptors']}) to obtain the language-guided domain shifts and then simulate the domain-specific concept activations. To erase the effect of domain-specific concepts, the DDO regularizer encourages the orthogonality between the class-concept correlation matrix $W_F$ ($\emph{i.e}.\xspace$ the final linear weight) and domain-specific concept activation $\widehat{\boldsymbol{a} _{\text{sp}}}$.
  • Figure 4: OOD performance on three multiple unseen domain generalization benchmarks, PACS, OfficeHome and DomainNet.
  • Figure 5: Ablation studies for the impact of the number of domain descriptors. For each quantity, we randomly selected domain descriptors from a total of 200 domain descriptors and averaged the results over five random selections. Results of DomainNet are shown in the Appendix \ref{['appendix:ablation']}.
  • ...and 6 more figures