Say My Name: a Model's Bias Discovery Framework
Massimiliano Ciranni, Luca Molinaro, Carlo Alberto Barbano, Attilio Fiandrotti, Vittorio Murino, Vito Paolo Pastore, Enzo Tartaglione
TL;DR
SaMyNa introduces a semantic bias discovery framework that names biases learned by deep vision models without requiring bias-labeled validation data. It combines bias mining during training with a text-based bias naming pipeline that uses sample exemplars, captioning, and learned embeddings to produce interpretable bias keywords, ranked by cosine similarity to a bias embedding. The method can operate at training or inference time and supports debiasing by generating pseudo-labels for GroupDRO-based mitigation, achieving competitive results on Waterbirds, CelebA, and ImageNet-A. Overall, SaMyNa enhances explainability and provides a practical pathway to diagnosing and mitigating model biases with semantic, human-understandable descriptors.
Abstract
In the last few years, due to the broad applicability of deep learning to downstream tasks and end-to-end training capabilities, increasingly more concerns about potential biases to specific, non-representative patterns have been raised. Many works focusing on unsupervised debiasing usually leverage the tendency of deep models to learn ``easier'' samples, for example by clustering the latent space to obtain bias pseudo-labels. However, the interpretation of such pseudo-labels is not trivial, especially for a non-expert end user, as it does not provide semantic information about the bias features. To address this issue, we introduce ``Say My Name'' (SaMyNa), the first tool to identify biases within deep models semantically. Unlike existing methods, our approach focuses on biases learned by the model. Our text-based pipeline enhances explainability and supports debiasing efforts: applicable during either training or post-hoc validation, our method can disentangle task-related information and proposes itself as a tool to analyze biases. Evaluation on traditional benchmarks demonstrates its effectiveness in detecting biases and even disclaiming them, showcasing its broad applicability for model diagnosis.
