Table of Contents
Fetching ...

TextCAVs: Debugging vision models using text

Angus Nicolson, Yarin Gal, J. Alison Noble

TL;DR

TextCAVs address the cost of concept-based explanations by deriving concept activation vectors from text descriptions using a vision–language model, enabling gradient-based explanations without image exemplars. The method trains linear mappings $h$ and $g$ between CLIP-like representations and the target model, with losses $\mathcal{L}_{mse}$ and $\mathcal{L}_{cyc}$, and computes concept sensitivity via $S_{c,k} = \nabla \Phi_{b,k}(\Φ_a(x)) \cdot \mathbf{v}_c$, enabling label-free testing of concepts. Experiments on ImageNet and MIMIC-CXR show TextCAVs produce meaningful explanations and can reveal model bias (e.g., biased Atelectasis explanations dominated by device-related concepts), supporting interactive debugging. The work demonstrates cross-domain applicability of text-based CAVs, offering rapid hypothesis testing and potential to improve safety and reliability in vision models, especially in medical contexts where labeled concept exemplars are scarce.

Abstract

Concept-based interpretability methods are a popular form of explanation for deep learning models which provide explanations in the form of high-level human interpretable concepts. These methods typically find concept activation vectors (CAVs) using a probe dataset of concept examples. This requires labelled data for these concepts -- an expensive task in the medical domain. We introduce TextCAVs: a novel method which creates CAVs using vision-language models such as CLIP, allowing for explanations to be created solely using text descriptions of the concept, as opposed to image exemplars. This reduced cost in testing concepts allows for many concepts to be tested and for users to interact with the model, testing new ideas as they are thought of, rather than a delay caused by image collection and annotation. In early experimental results, we demonstrate that TextCAVs produces reasonable explanations for a chest x-ray dataset (MIMIC-CXR) and natural images (ImageNet), and that these explanations can be used to debug deep learning-based models.

TextCAVs: Debugging vision models using text

TL;DR

TextCAVs address the cost of concept-based explanations by deriving concept activation vectors from text descriptions using a vision–language model, enabling gradient-based explanations without image exemplars. The method trains linear mappings and between CLIP-like representations and the target model, with losses and , and computes concept sensitivity via , enabling label-free testing of concepts. Experiments on ImageNet and MIMIC-CXR show TextCAVs produce meaningful explanations and can reveal model bias (e.g., biased Atelectasis explanations dominated by device-related concepts), supporting interactive debugging. The work demonstrates cross-domain applicability of text-based CAVs, offering rapid hypothesis testing and potential to improve safety and reliability in vision models, especially in medical contexts where labeled concept exemplars are scarce.

Abstract

Concept-based interpretability methods are a popular form of explanation for deep learning models which provide explanations in the form of high-level human interpretable concepts. These methods typically find concept activation vectors (CAVs) using a probe dataset of concept examples. This requires labelled data for these concepts -- an expensive task in the medical domain. We introduce TextCAVs: a novel method which creates CAVs using vision-language models such as CLIP, allowing for explanations to be created solely using text descriptions of the concept, as opposed to image exemplars. This reduced cost in testing concepts allows for many concepts to be tested and for users to interact with the model, testing new ideas as they are thought of, rather than a delay caused by image collection and annotation. In early experimental results, we demonstrate that TextCAVs produces reasonable explanations for a chest x-ray dataset (MIMIC-CXR) and natural images (ImageNet), and that these explanations can be used to debug deep learning-based models.
Paper Structure (17 sections, 5 equations, 2 figures, 4 tables)

This paper contains 17 sections, 5 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Explaining models with TextCAVs. In order to move between the activations of a CLIP model and our target model, we train linear transformations, $h$ and $g$, using a text dataset, $\mathbb{D}_T$, and image dataset, $\mathbb{D}_I$. The loss terms are detailed on the right with $I_{\Phi}$, $I_{\Psi}$ and $T_{\Psi}$ representing the image features of the target model, the image features of the CLIP model, and the text features of the CLIP model, respectively. Once $h$ is trained, TextCAVs can be created by passing text representing some concept, $c$, through the CLIP model and $h$. The model's sensitvity to $c$, for some logit output, $k$, can then be measured using the directional derivative, $S_{c,k}$: the similarity between the model gradient, $\nabla \Phi_{b,k}$, and a TextCAV, ${\bm{v}}_c$.
  • Figure 2: MIMIC-CXR dataset characteristics. Left: The number of images per class in the training set of the target models. Right: The proportion of training images that contain a support device for each class.