Table of Contents
Fetching ...

Say My Name: a Model's Bias Discovery Framework

Massimiliano Ciranni, Luca Molinaro, Carlo Alberto Barbano, Attilio Fiandrotti, Vittorio Murino, Vito Paolo Pastore, Enzo Tartaglione

TL;DR

SaMyNa introduces a semantic bias discovery framework that names biases learned by deep vision models without requiring bias-labeled validation data. It combines bias mining during training with a text-based bias naming pipeline that uses sample exemplars, captioning, and learned embeddings to produce interpretable bias keywords, ranked by cosine similarity to a bias embedding. The method can operate at training or inference time and supports debiasing by generating pseudo-labels for GroupDRO-based mitigation, achieving competitive results on Waterbirds, CelebA, and ImageNet-A. Overall, SaMyNa enhances explainability and provides a practical pathway to diagnosing and mitigating model biases with semantic, human-understandable descriptors.

Abstract

In the last few years, due to the broad applicability of deep learning to downstream tasks and end-to-end training capabilities, increasingly more concerns about potential biases to specific, non-representative patterns have been raised. Many works focusing on unsupervised debiasing usually leverage the tendency of deep models to learn ``easier'' samples, for example by clustering the latent space to obtain bias pseudo-labels. However, the interpretation of such pseudo-labels is not trivial, especially for a non-expert end user, as it does not provide semantic information about the bias features. To address this issue, we introduce ``Say My Name'' (SaMyNa), the first tool to identify biases within deep models semantically. Unlike existing methods, our approach focuses on biases learned by the model. Our text-based pipeline enhances explainability and supports debiasing efforts: applicable during either training or post-hoc validation, our method can disentangle task-related information and proposes itself as a tool to analyze biases. Evaluation on traditional benchmarks demonstrates its effectiveness in detecting biases and even disclaiming them, showcasing its broad applicability for model diagnosis.

Say My Name: a Model's Bias Discovery Framework

TL;DR

SaMyNa introduces a semantic bias discovery framework that names biases learned by deep vision models without requiring bias-labeled validation data. It combines bias mining during training with a text-based bias naming pipeline that uses sample exemplars, captioning, and learned embeddings to produce interpretable bias keywords, ranked by cosine similarity to a bias embedding. The method can operate at training or inference time and supports debiasing by generating pseudo-labels for GroupDRO-based mitigation, achieving competitive results on Waterbirds, CelebA, and ImageNet-A. Overall, SaMyNa enhances explainability and provides a practical pathway to diagnosing and mitigating model biases with semantic, human-understandable descriptors.

Abstract

In the last few years, due to the broad applicability of deep learning to downstream tasks and end-to-end training capabilities, increasingly more concerns about potential biases to specific, non-representative patterns have been raised. Many works focusing on unsupervised debiasing usually leverage the tendency of deep models to learn ``easier'' samples, for example by clustering the latent space to obtain bias pseudo-labels. However, the interpretation of such pseudo-labels is not trivial, especially for a non-expert end user, as it does not provide semantic information about the bias features. To address this issue, we introduce ``Say My Name'' (SaMyNa), the first tool to identify biases within deep models semantically. Unlike existing methods, our approach focuses on biases learned by the model. Our text-based pipeline enhances explainability and supports debiasing efforts: applicable during either training or post-hoc validation, our method can disentangle task-related information and proposes itself as a tool to analyze biases. Evaluation on traditional benchmarks demonstrates its effectiveness in detecting biases and even disclaiming them, showcasing its broad applicability for model diagnosis.
Paper Structure (51 sections, 5 equations, 13 figures, 27 tables)

This paper contains 51 sections, 5 equations, 13 figures, 27 tables.

Figures (13)

  • Figure 1: Top: SaMyNa searches potential spurious features learned by a model, providing a ranked list of keywords. Bottom: Example of bias heatmaps (red = correlated with bias, blue = anti-correlated).
  • Figure 2: Pipeline for SaMyNa. Given a model, we can tell on either $\mathcal{D}^{\text{train}}$or $\mathcal{D}^{\text{val}}$, which are the correctly (with green border) and the incorrectly (red border) classified samples. Amongst these, we first perform a sample subset selection looking at the latent space of the model under analysis and choosing through $k$-medoids, the most representative samples for the learned class. Then, we employ a captioner to get a textual description of these samples. From these descriptions, we extract non-rare words as keywords, and, in parallel, working in the latent space of a text encoder, we extract the mean description for the learned classes, cleansed from common features within the dataset. We finally compare this representation with the embedding of the keywords, revealing learned correlations that extend beyond the target class signal.
  • Figure 3: Similarity scores for Waterbirds (a) and CelebA (b).
  • Figure 4: Similarity scores for the BAR dataset.
  • Figure 5: Similarity scores for the crayfish, rhinoceros beetle, stick insect and cockroach classes from ImageNet-A.
  • ...and 8 more figures