MAVias: Mitigate any Visual Bias
Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou
TL;DR
MAVias tackles open-set biases in visual recognition by using foundation-model image tagging to describe visual content and an LLM to identify which tags are irrelevant to the target class. The identified biases are embedded in a vision-language space and projected into the main model's feature space as bias logits $\mathbf{z}_{tag}$ that are added to the primary logits $\mathbf{z}_{main}$; training optimizes a combined loss $\mathcal{L} = \mathcal{L}_{cls} + \alpha \mathcal{L}_{align}$ to stabilize learning and reduce reliance on biases. Empirically, MAVias achieves strong open-set bias mitigation across CelebA, Waterbirds, UrbanCars, and ImageNet9, with substantial improvements in worst-group accuracy and reductions in background and co-occurring object biases, while maintaining competitive unbiased performance. The approach scales to unknown biases and is open-sourced in the VB-Mitigator library, offering a practical path toward bias-invariant visual understanding in real-world datasets.
Abstract
Mitigating biases in computer vision models is an essential step towards the trustworthiness of artificial intelligence models. Existing bias mitigation methods focus on a small set of predefined biases, limiting their applicability in visual datasets where multiple, possibly unknown biases exist. To address this limitation, we introduce MAVias, an open-set bias mitigation approach leveraging foundation models to discover spurious associations between visual attributes and target classes. MAVias first captures a wide variety of visual features in natural language via a foundation image tagging model, and then leverages a large language model to select those visual features defining the target class, resulting in a set of language-coded potential visual biases. We then translate this set of potential biases into vision-language embeddings and introduce an in-processing bias mitigation approach to prevent the model from encoding information related to them. Our experiments on diverse datasets, including CelebA, Waterbirds, ImageNet, and UrbanCars, show that MAVias effectively detects and mitigates a wide range of biases in visual recognition tasks outperforming current state-of-the-art.
