Attention IoU: Examining Biases in CelebA using Attention Maps
Aaron Serianni, Tyler Zhu, Olga Russakovsky, Vikram V. Ramaswamy
TL;DR
The paper tackles bias in vision models by introducing Attention-IoU, a generalized IoU over $L_1$-normalized attention maps to quantify how much a model relies on non-causal features. It combines GradCAM-based attention with two bias scores, Heatmap and Mask, to reveal spurious correlations and potential confounders, validated on Waterbirds and CelebA. Results show Attention-IoU captures biases that persist beyond label correlations, highlights co-localization effects, and uncovers hidden confounders, guiding more effective debiasing and dataset design. This map-based, interpretable framework enables fine-grained analysis of internal representations, with practical implications for fairness in computer vision systems.
Abstract
Computer vision models have been shown to exhibit and amplify biases across a wide array of datasets and tasks. Existing methods for quantifying bias in classification models primarily focus on dataset distribution and model performance on subgroups, overlooking the internal workings of a model. We introduce the Attention-IoU (Attention Intersection over Union) metric and related scores, which use attention maps to reveal biases within a model's internal representations and identify image features potentially causing the biases. First, we validate Attention-IoU on the synthetic Waterbirds dataset, showing that the metric accurately measures model bias. We then analyze the CelebA dataset, finding that Attention-IoU uncovers correlations beyond accuracy disparities. Through an investigation of individual attributes through the protected attribute of Male, we examine the distinct ways biases are represented in CelebA. Lastly, by subsampling the training set to change attribute correlations, we demonstrate that Attention-IoU reveals potential confounding variables not present in dataset labels.
