Table of Contents
Fetching ...

MAVias: Mitigate any Visual Bias

Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou

TL;DR

MAVias tackles open-set biases in visual recognition by using foundation-model image tagging to describe visual content and an LLM to identify which tags are irrelevant to the target class. The identified biases are embedded in a vision-language space and projected into the main model's feature space as bias logits $\mathbf{z}_{tag}$ that are added to the primary logits $\mathbf{z}_{main}$; training optimizes a combined loss $\mathcal{L} = \mathcal{L}_{cls} + \alpha \mathcal{L}_{align}$ to stabilize learning and reduce reliance on biases. Empirically, MAVias achieves strong open-set bias mitigation across CelebA, Waterbirds, UrbanCars, and ImageNet9, with substantial improvements in worst-group accuracy and reductions in background and co-occurring object biases, while maintaining competitive unbiased performance. The approach scales to unknown biases and is open-sourced in the VB-Mitigator library, offering a practical path toward bias-invariant visual understanding in real-world datasets.

Abstract

Mitigating biases in computer vision models is an essential step towards the trustworthiness of artificial intelligence models. Existing bias mitigation methods focus on a small set of predefined biases, limiting their applicability in visual datasets where multiple, possibly unknown biases exist. To address this limitation, we introduce MAVias, an open-set bias mitigation approach leveraging foundation models to discover spurious associations between visual attributes and target classes. MAVias first captures a wide variety of visual features in natural language via a foundation image tagging model, and then leverages a large language model to select those visual features defining the target class, resulting in a set of language-coded potential visual biases. We then translate this set of potential biases into vision-language embeddings and introduce an in-processing bias mitigation approach to prevent the model from encoding information related to them. Our experiments on diverse datasets, including CelebA, Waterbirds, ImageNet, and UrbanCars, show that MAVias effectively detects and mitigates a wide range of biases in visual recognition tasks outperforming current state-of-the-art.

MAVias: Mitigate any Visual Bias

TL;DR

MAVias tackles open-set biases in visual recognition by using foundation-model image tagging to describe visual content and an LLM to identify which tags are irrelevant to the target class. The identified biases are embedded in a vision-language space and projected into the main model's feature space as bias logits that are added to the primary logits ; training optimizes a combined loss to stabilize learning and reduce reliance on biases. Empirically, MAVias achieves strong open-set bias mitigation across CelebA, Waterbirds, UrbanCars, and ImageNet9, with substantial improvements in worst-group accuracy and reductions in background and co-occurring object biases, while maintaining competitive unbiased performance. The approach scales to unknown biases and is open-sourced in the VB-Mitigator library, offering a practical path toward bias-invariant visual understanding in real-world datasets.

Abstract

Mitigating biases in computer vision models is an essential step towards the trustworthiness of artificial intelligence models. Existing bias mitigation methods focus on a small set of predefined biases, limiting their applicability in visual datasets where multiple, possibly unknown biases exist. To address this limitation, we introduce MAVias, an open-set bias mitigation approach leveraging foundation models to discover spurious associations between visual attributes and target classes. MAVias first captures a wide variety of visual features in natural language via a foundation image tagging model, and then leverages a large language model to select those visual features defining the target class, resulting in a set of language-coded potential visual biases. We then translate this set of potential biases into vision-language embeddings and introduce an in-processing bias mitigation approach to prevent the model from encoding information related to them. Our experiments on diverse datasets, including CelebA, Waterbirds, ImageNet, and UrbanCars, show that MAVias effectively detects and mitigates a wide range of biases in visual recognition tasks outperforming current state-of-the-art.

Paper Structure

This paper contains 26 sections, 9 equations, 6 figures, 20 tables.

Figures (6)

  • Figure 1: MAVias identifies instance-level potential visual biases through foundational models that extract tags representing visual features and assess relevance to the target class. Then, MAVias encodes these features within the vision-language space and integrates them into a bias-aware framework to train a model that is invariant to such visual biases.
  • Figure 2: Illustration of the proposed framework for mitigating any visual bias during model training. For inference, only the backbone and the classification layer are considered (i.e., $f_{\boldsymbol{\theta}}$).
  • Figure 3: Logits for UrbanCars training samples belonging to groups defined by the urban car class and Background (BG) and Co-occurring Object (CoObj) biases.
  • Figure 4: LLM system prompt for deriving the relevant tags.
  • Figure 5: Two-moon problem on 3 dimensions. The distributions are linearly separable on axis $z$ (i.e., $z$ feature introduces bias), while the actual target is to learn the distributions defined by the features $x$ and $y$.
  • ...and 1 more figures