DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation
Rakshith Subramanyam, Kowshik Thopalli, Vivek Narayanaswamy, Jayaraman J. Thiagarajan
TL;DR
DECIDER introduces a failure-detection framework that leverages priors from foundation models to identify inputs likely to fail in image classification. It trains a Prior Induced Model (PIM) that projects features into a vision-language embedding space, guided by task-relevant core attributes generated by an LLM, and detects failures by measuring disagreement between the original classifier and PIM. The approach also yields human-interpretable explanations through an attribute-ablation mechanism that highlights the features the original model underutilizes. Empirically, DECIDER achieves state-of-the-art performance across diverse failure modes, including subpopulation, spurious-correlation, class-imbalance, and covariate-shift benchmarks, with robust MCC gains and favorable failure/success recall trade-offs. The work demonstrates the practical value of integrating vision-language priors and language-based attribute descriptions for reliable, interpretable failure detection in safety-critical vision systems.
Abstract
Reliably detecting when a deployed machine learning model is likely to fail on a given input is crucial for ensuring safe operation. In this work, we propose DECIDER (Debiasing Classifiers to Identify Errors Reliably), a novel approach that leverages priors from large language models (LLMs) and vision-language models (VLMs) to detect failures in image classification models. DECIDER utilizes LLMs to specify task-relevant core attributes and constructs a ``debiased'' version of the classifier by aligning its visual features to these core attributes using a VLM, and detects potential failure by measuring disagreement between the original and debiased models. In addition to proactively identifying samples on which the model would fail, DECIDER also provides human-interpretable explanations for failure through a novel attribute-ablation strategy. Through extensive experiments across diverse benchmarks spanning subpopulation shifts (spurious correlations, class imbalance) and covariate shifts (synthetic corruptions, domain shifts), DECIDER consistently achieves state-of-the-art failure detection performance, significantly outperforming baselines in terms of the overall Matthews correlation coefficient as well as failure and success recall. Our codes can be accessed at~\url{https://github.com/kowshikthopalli/DECIDER/}
