Table of Contents
Fetching ...

Attention IoU: Examining Biases in CelebA using Attention Maps

Aaron Serianni, Tyler Zhu, Olga Russakovsky, Vikram V. Ramaswamy

TL;DR

The paper tackles bias in vision models by introducing Attention-IoU, a generalized IoU over $L_1$-normalized attention maps to quantify how much a model relies on non-causal features. It combines GradCAM-based attention with two bias scores, Heatmap and Mask, to reveal spurious correlations and potential confounders, validated on Waterbirds and CelebA. Results show Attention-IoU captures biases that persist beyond label correlations, highlights co-localization effects, and uncovers hidden confounders, guiding more effective debiasing and dataset design. This map-based, interpretable framework enables fine-grained analysis of internal representations, with practical implications for fairness in computer vision systems.

Abstract

Computer vision models have been shown to exhibit and amplify biases across a wide array of datasets and tasks. Existing methods for quantifying bias in classification models primarily focus on dataset distribution and model performance on subgroups, overlooking the internal workings of a model. We introduce the Attention-IoU (Attention Intersection over Union) metric and related scores, which use attention maps to reveal biases within a model's internal representations and identify image features potentially causing the biases. First, we validate Attention-IoU on the synthetic Waterbirds dataset, showing that the metric accurately measures model bias. We then analyze the CelebA dataset, finding that Attention-IoU uncovers correlations beyond accuracy disparities. Through an investigation of individual attributes through the protected attribute of Male, we examine the distinct ways biases are represented in CelebA. Lastly, by subsampling the training set to change attribute correlations, we demonstrate that Attention-IoU reveals potential confounding variables not present in dataset labels.

Attention IoU: Examining Biases in CelebA using Attention Maps

TL;DR

The paper tackles bias in vision models by introducing Attention-IoU, a generalized IoU over -normalized attention maps to quantify how much a model relies on non-causal features. It combines GradCAM-based attention with two bias scores, Heatmap and Mask, to reveal spurious correlations and potential confounders, validated on Waterbirds and CelebA. Results show Attention-IoU captures biases that persist beyond label correlations, highlights co-localization effects, and uncovers hidden confounders, guiding more effective debiasing and dataset design. This map-based, interpretable framework enables fine-grained analysis of internal representations, with practical implications for fairness in computer vision systems.

Abstract

Computer vision models have been shown to exhibit and amplify biases across a wide array of datasets and tasks. Existing methods for quantifying bias in classification models primarily focus on dataset distribution and model performance on subgroups, overlooking the internal workings of a model. We introduce the Attention-IoU (Attention Intersection over Union) metric and related scores, which use attention maps to reveal biases within a model's internal representations and identify image features potentially causing the biases. First, we validate Attention-IoU on the synthetic Waterbirds dataset, showing that the metric accurately measures model bias. We then analyze the CelebA dataset, finding that Attention-IoU uncovers correlations beyond accuracy disparities. Through an investigation of individual attributes through the protected attribute of Male, we examine the distinct ways biases are represented in CelebA. Lastly, by subsampling the training set to change attribute correlations, we demonstrate that Attention-IoU reveals potential confounding variables not present in dataset labels.

Paper Structure

This paper contains 16 sections, 9 equations, 14 figures.

Figures (14)

  • Figure 1: We use attention maps to understand which image regions a model relies on for the target classification task. Our proposed Attention-IoU framework provides insights into how models represents biases between correlated attributes. For example, consider the spatially related attributes of blond and wavy hair in the CelebA dataset liu_deep_2015, which have similar label correlations to the Male label. They are attended to differently by the model, with blond hair appearing closer to Male in both average attention map (top row) and the Attention-IoU mask score (bottom row). Thus, Attention-IoU reveals that blond hair, when compared to wavy hair, has a spurious correlation with Male that is not present in the dataset labels.
  • Figure 2: Attention maps for a landbird on a water background in the Waterbirds dataset sagawa_distributionally_2020, illustrating possible forms of model bias for incorrect classifications. (left) attending to the whole background; (center) attending to a ship instead of the bird; (right) only attending to a part of the bird, its wing in flight.
  • Figure 3: Average bird mask and average heatmaps for Waterbirds at increasing levels of bias. We see that the model attends less on the bird as the bias increases, as indicated by its mask.
  • Figure 4: Evaluation of mask score using GradCAM on Waterbirds test set. The X-axis represents the Attention-IoU mask score for the ground-truth masks of the bird and background. We note the dataset bias and the worst group accuracy (WGA) along the Y-axis. As the bias increases, the worst group accuracy decreases and the model attends less to the bird and more to the background.
  • Figure 5: Evaluation of mask score using GradCAM on CelebA test set with attribute-specific feature masks, compared to worst group accuracy with Male. A mask score of 1 indicates perfect agreement between the attention map and feature mask, and 0 indicates perfect disagreement. Groups are considered based on ground-truth labels for the different combinations of target attribute and Male. If the number of images in a group is less than 1% of the test set, the group was excluded from consideration.
  • ...and 9 more figures