Language-guided Detection and Mitigation of Unknown Dataset Bias
Zaiying Zhao, Soichiro Kumano, Toshihiko Yamasaki
TL;DR
The paper tackles unknown dataset biases that impair classifier performance on minority groups by proposing a language-guided framework that first detects biases as keywords from captions generated by vision-language models using GPT-4, and then mitigates bias through two methods: Language-guided Group-DRO (pseudo-labels for bias attributes enabling Group-DRO) and Language-guided Diffusion-based Augmentation (generating minority-group images with Stable Diffusion using bias keywords). Across CMNIST, Waterbirds, and CelebA, the approach outperforms state-of-the-art methods that do not assume prior bias knowledge and is competitive with methods that do rely on known biases. The framework enhances interpretability by presenting biases as textual keywords and demonstrates robustness across backbones and tasks, highlighting practical potential for real-world bias mitigation. Overall, the combination of accurate bias keyword extraction and two complementary debiasing pathways yields strong performance while maintaining interpretability in unknown-bias settings.
Abstract
Dataset bias is a significant problem in training fair classifiers. When attributes unrelated to classification exhibit strong biases towards certain classes, classifiers trained on such dataset may overfit to these bias attributes, substantially reducing the accuracy for minority groups. Mitigation techniques can be categorized according to the availability of bias information (\ie, prior knowledge). Although scenarios with unknown biases are better suited for real-world settings, previous work in this field often suffers from a lack of interpretability regarding biases and lower performance. In this study, we propose a framework to identify potential biases as keywords without prior knowledge based on the partial occurrence in the captions. We further propose two debiasing methods: (a) handing over to an existing debiasing approach which requires prior knowledge by assigning pseudo-labels, and (b) employing data augmentation via text-to-image generative models, using acquired bias keywords as prompts. Despite its simplicity, experimental results show that our framework not only outperforms existing methods without prior knowledge, but also is even comparable with a method that assumes prior knowledge.
