Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim, Ze Wang, Qiang Qiu
TL;DR
The paper tackles spurious correlations in vision by strengthening interpretability through Concept Bottleneck Models (CBMs). It introduces a three-stage framework that uses multimodal foundation models (MLLMs and LLMs) to automatically discover, annotate, and optionally refine visual concepts, enabling near-zero human labeling effort. By collecting concepts unaffected by spurious cues, annotating with LLaVA, and refining via a chain of vision models, the approach yields CBMs that reduce reliance on spurious correlations while preserving interpretability. Empirical results across ImageNet-Opener, Metashifts, and Waterbirds show competitive or superior worst-group robustness compared to baselines, with notable gains when annotation refinement is employed. The work demonstrates a practical pathway to robust, interpretable models with minimal human annotation cost, broadening the applicability of CBMs in real-world datasets.
Abstract
Enhancing model interpretability can address spurious correlations by revealing how models draw their predictions. Concept Bottleneck Models (CBMs) can provide a principled way of disclosing and guiding model behaviors through human-understandable concepts, albeit at a high cost of human efforts in data annotation. In this paper, we leverage a synergy of multiple foundation models to construct CBMs with nearly no human effort. We discover undesirable biases in CBMs built on pre-trained models and propose a novel framework designed to exploit pre-trained models while being immune to these biases, thereby reducing vulnerability to spurious correlations. Specifically, our method offers a seamless pipeline that adopts foundation models for assessing potential spurious correlations in datasets, annotating concepts for images, and refining the annotations for improved robustness. We evaluate the proposed method on multiple datasets, and the results demonstrate its effectiveness in reducing model reliance on spurious correlations while preserving its interpretability.
