Table of Contents
Fetching ...

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

Usha Bhalla, Alex Oesterling, Suraj Srinivas, Flavio P. Calmon, Himabindu Lakkaraju

TL;DR

The semantic structure of CLIP's latent space can be leveraged to provide interpretability, allowing for the decomposition of representations into semantic concepts, and a novel method, Sparse Linear Concept Embeddings, is proposed for transforming CLIP representations into sparse linear combinations of human-interpretable concepts.

Abstract

CLIP embeddings have demonstrated remarkable performance across a wide range of multimodal applications. However, these high-dimensional, dense vector representations are not easily interpretable, limiting our understanding of the rich structure of CLIP and its use in downstream applications that require transparency. In this work, we show that the semantic structure of CLIP's latent space can be leveraged to provide interpretability, allowing for the decomposition of representations into semantic concepts. We formulate this problem as one of sparse recovery and propose a novel method, Sparse Linear Concept Embeddings, for transforming CLIP representations into sparse linear combinations of human-interpretable concepts. Distinct from previous work, SpLiCE is task-agnostic and can be used, without training, to explain and even replace traditional dense CLIP representations, maintaining high downstream performance while significantly improving their interpretability. We also demonstrate significant use cases of SpLiCE representations including detecting spurious correlations and model editing.

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

TL;DR

The semantic structure of CLIP's latent space can be leveraged to provide interpretability, allowing for the decomposition of representations into semantic concepts, and a novel method, Sparse Linear Concept Embeddings, is proposed for transforming CLIP representations into sparse linear combinations of human-interpretable concepts.

Abstract

CLIP embeddings have demonstrated remarkable performance across a wide range of multimodal applications. However, these high-dimensional, dense vector representations are not easily interpretable, limiting our understanding of the rich structure of CLIP and its use in downstream applications that require transparency. In this work, we show that the semantic structure of CLIP's latent space can be leveraged to provide interpretability, allowing for the decomposition of representations into semantic concepts. We formulate this problem as one of sparse recovery and propose a novel method, Sparse Linear Concept Embeddings, for transforming CLIP representations into sparse linear combinations of human-interpretable concepts. Distinct from previous work, SpLiCE is task-agnostic and can be used, without training, to explain and even replace traditional dense CLIP representations, maintaining high downstream performance while significantly improving their interpretability. We also demonstrate significant use cases of SpLiCE representations including detecting spurious correlations and model editing.
Paper Structure (38 sections, 1 theorem, 14 equations, 15 figures, 10 tables)

This paper contains 38 sections, 1 theorem, 14 equations, 15 figures, 10 tables.

Key Result

Proposition 1

Given Assumptions 1-5, CLIP image embeddings $f$ can be written as a sparse linear combination of text embeddings, i.e, where $\mathbf{w} \in \mathbb{R}_+^k$, and $\mathbf{C}^\text{txt} \in \mathbb{R}^{d \times k}$, which is the text concept dictionary defined previously.

Figures (15)

  • Figure 1: Visualization of SpLiCE , which converts dense, uninterpretable CLIP representations (z) into sparse semantic decompositions (w) by solving for a sparse nonnegative linear combination over an overcomplete concept set (C).
  • Figure 2: Example images from MSCOCO shown with their captions below and their concept decompositions on the right. We display the top seven concepts for visualization purposes, but images in the figure had decompositions with 7-20 concepts.
  • Figure 3: Performance of SpLiCE decomposition representations on zero-shot classification tasks (bottom row) and cosine similarity between CLIP embeddings and SpLiCE embeddings (top row). Our proposed semantic dictionary (yellow) closely approximates CLIP on zero-shot classification accuracy, but not on the cosine similarity. This indicates that SpLiCE captures the semantic information in CLIP, but not its non-semantic components, explaining both the high zero-shot accuracy and low cosine similarity. See § \ref{['sec:perf']} for discussion.
  • Figure 4: Left: SpLiCE decompositions of ImageNet 'African Elephant', 'Curly-coated Retriever', 'Monarch Butterfly', 'Digital Clock' classes. Right: Distribution of "Swimwear" concept in 'Woman' and 'Man' classes of CIFAR100.
  • Figure 5: Ablation study evaluating the efficacy of SpLiCE design choices across three metrics: Zero-shot accuracy, cosine reconstruction, and semantic relevance of recovered tags. We find that all of our design choices, namely non-negativity, modality alignment, and usage of large task-agnostic dictionary are essential to performance. See § \ref{['sec:ablations']} for discussion.
  • ...and 10 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof