Table of Contents
Fetching ...

Discovering and Mitigating Visual Biases through Keyword Explanation

Younghyun Kim, Sangwoo Mo, Minkyu Kim, Kyungmin Lee, Jaeho Lee, Jinwoo Shin

TL;DR

This work addresses unexplainable visual biases in image classifiers by translating bias signals into keyword explanations (B2T). The method generates captions from mispredicted images, extracts common bias keywords with YAKE, and validates them with a vision-language model to ensure they reflect systematic errors. B2T identifies both known biases (e.g., CelebA gender, Waterbirds backgrounds, ImageNet distribution shifts) and novel biases in larger datasets (Dollar Street, ImageNet), including contextual associations. The keywords enable practical debiasing and analysis tasks, such as DRO-based training and CLIP prompting, and the approach remains robust across captioning and scoring models while acknowledging limitations and ethical considerations.

Abstract

Addressing biases in computer vision models is crucial for real-world AI deployments. However, mitigating visual biases is challenging due to their unexplainable nature, often identified indirectly through visualization or sample statistics, which necessitates additional human supervision for interpretation. To tackle this issue, we propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords. Specifically, we extract common keywords from the captions of mispredicted images to identify potential biases in the model. We then validate these keywords by measuring their similarity to the mispredicted images using a vision-language scoring model. The keyword explanation form of visual bias offers several advantages, such as a clear group naming for bias discovery and a natural extension for debiasing using these group names. Our experiments demonstrate that B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C. Additionally, B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet. For example, we discovered a contextual bias between "bee" and "flower" in ImageNet. We also highlight various applications of B2T keywords, including debiased training, CLIP prompting, and model comparison.

Discovering and Mitigating Visual Biases through Keyword Explanation

TL;DR

This work addresses unexplainable visual biases in image classifiers by translating bias signals into keyword explanations (B2T). The method generates captions from mispredicted images, extracts common bias keywords with YAKE, and validates them with a vision-language model to ensure they reflect systematic errors. B2T identifies both known biases (e.g., CelebA gender, Waterbirds backgrounds, ImageNet distribution shifts) and novel biases in larger datasets (Dollar Street, ImageNet), including contextual associations. The keywords enable practical debiasing and analysis tasks, such as DRO-based training and CLIP prompting, and the approach remains robust across captioning and scoring models while acknowledging limitations and ethical considerations.

Abstract

Addressing biases in computer vision models is crucial for real-world AI deployments. However, mitigating visual biases is challenging due to their unexplainable nature, often identified indirectly through visualization or sample statistics, which necessitates additional human supervision for interpretation. To tackle this issue, we propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords. Specifically, we extract common keywords from the captions of mispredicted images to identify potential biases in the model. We then validate these keywords by measuring their similarity to the mispredicted images using a vision-language scoring model. The keyword explanation form of visual bias offers several advantages, such as a clear group naming for bias discovery and a natural extension for debiasing using these group names. Our experiments demonstrate that B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C. Additionally, B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet. For example, we discovered a contextual bias between "bee" and "flower" in ImageNet. We also highlight various applications of B2T keywords, including debiased training, CLIP prompting, and model comparison.
Paper Structure (38 sections, 3 equations, 18 figures, 18 tables)

This paper contains 38 sections, 3 equations, 18 figures, 18 tables.

Figures (18)

  • Figure 1: Concept. Our Bias-to-Text (B2T) framework reveals visual biases of image classifiers in a keyword explanation form. For example, B2T identified novel biases in ImageNet deng2009imagenet. Specifically, the keyword "flower" implies that the classifier associates "ant" images with "flower" as "bees," indicating contextual bias.
  • Figure 2: Method. (Step 1) B2T generates language descriptions from mispredicted images and extracts common keywords. We then verify whether these keywords indicate bias by measuring their similarity to the mispredicted images using a vision-language model like CLIP radford2021learning. (Step 2) The discovered keywords have various applications, including debiased training, CLIP prompting, and model comparison.
  • Figure 3: Effect of the CLIP score (waterbird class). (a) The CLIP score can identify incorrect bias keywords, showing low CLIP scores near zero for non-bias keywords like "species." (b) The ROC curve represents subgroup accuracy, which defines the subgroup based on images with high CLIP similarity to specific keywords while varying the thresholds. The legend displays the B2T keywords alongside their corresponding CLIP scores in parentheses, with the AUROC of their respective curves denoted after the equal sign. Keywords with high CLIP scores tend to exhibit low subgroup accuracies, indicating they are biases. (c) Colored dots illustrate the negative correlation between the CLIP score and AUROC of subgroup accuracy over B2T keywords, indicating that a higher CLIP score implies stronger bias.
  • Figure 4: Discovered biases in image classifiers. Visual examples of mispredicted images, along with their corresponding bias keywords, captions, actual classes, and predicted classes. B2T successfully identified known biases, such as (a) gender bias in CelebA blond, (b) background bias in Waterbirds, and distribution shifts in (c) ImageNet-R with different styles, and (d) ImageNet-C with natural corruptions. B2T also uncovered novel biases in larger datasets, such as the spurious correlations between (e) the keyword "cave" and the wardrobe class, indicating geographical bias in Dollar Street, and (f) the keyword "flower" and the ant class, indicating contextual bias in ImageNet.
  • Figure 5: Comparison of bias discovery methods. The AUROC curves for (a) CelebA blond (male) and (b) Waterbirds (waterbirds on land), with parentheses indicating the corresponding minority groups. B2T outperforms prior works by a large margin.
  • ...and 13 more figures