DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation

Rakshith Subramanyam; Kowshik Thopalli; Vivek Narayanaswamy; Jayaraman J. Thiagarajan

DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation

Rakshith Subramanyam, Kowshik Thopalli, Vivek Narayanaswamy, Jayaraman J. Thiagarajan

TL;DR

DECIDER introduces a failure-detection framework that leverages priors from foundation models to identify inputs likely to fail in image classification. It trains a Prior Induced Model (PIM) that projects features into a vision-language embedding space, guided by task-relevant core attributes generated by an LLM, and detects failures by measuring disagreement between the original classifier and PIM. The approach also yields human-interpretable explanations through an attribute-ablation mechanism that highlights the features the original model underutilizes. Empirically, DECIDER achieves state-of-the-art performance across diverse failure modes, including subpopulation, spurious-correlation, class-imbalance, and covariate-shift benchmarks, with robust MCC gains and favorable failure/success recall trade-offs. The work demonstrates the practical value of integrating vision-language priors and language-based attribute descriptions for reliable, interpretable failure detection in safety-critical vision systems.

Abstract

Reliably detecting when a deployed machine learning model is likely to fail on a given input is crucial for ensuring safe operation. In this work, we propose DECIDER (Debiasing Classifiers to Identify Errors Reliably), a novel approach that leverages priors from large language models (LLMs) and vision-language models (VLMs) to detect failures in image classification models. DECIDER utilizes LLMs to specify task-relevant core attributes and constructs a ``debiased'' version of the classifier by aligning its visual features to these core attributes using a VLM, and detects potential failure by measuring disagreement between the original and debiased models. In addition to proactively identifying samples on which the model would fail, DECIDER also provides human-interpretable explanations for failure through a novel attribute-ablation strategy. Through extensive experiments across diverse benchmarks spanning subpopulation shifts (spurious correlations, class imbalance) and covariate shifts (synthetic corruptions, domain shifts), DECIDER consistently achieves state-of-the-art failure detection performance, significantly outperforming baselines in terms of the overall Matthews correlation coefficient as well as failure and success recall. Our codes can be accessed at~\url{https://github.com/kowshikthopalli/DECIDER/}

DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation

TL;DR

Abstract

Paper Structure (28 sections, 2 equations, 6 figures, 10 tables)

This paper contains 28 sections, 2 equations, 6 figures, 10 tables.

Introduction
Related Work
Background
Proposed Approach
Motivation
Incorporating Foundation Model Priors
Generating Task-specific Core-attribute Descriptions
Training PIM
DECIDER: Failure Estimation Using PIM
Extracting Explanations for Failure
Empirical Evaluation
Experimental Setup
Baselines
Metrics
Findings
...and 13 more sections

Figures (6)

Figure 1: A visual illustration of the different failure scenarios we consider. These include scenarios when the model relies on spurious correlations present in the data i.e., when an attribute is spuriously correlated with the label (e.g., color of hair and gender). Another cause of failure is when the training data has class imbalance, leading to poorer generalization on images from the under-sampled class. Lastly, another important cause of failures are when the distribution of the test data is different from the training data. This can range from natural image corruptions to covariate shifts.
Figure 2: DECIDER for failure detection.(Left)DECIDER trains a Prior Induced Model (PIM) $\phi$, identical to the architecture of the pre-trained classifier $\mathcal{F}$, utilizing priors from a VLM model. (Top Right) The disagreement between the predictions of $\mathbb{\phi}$ and $\mathcal{F}$ serves as an indicator for failure detection. (Bottom Right) By adjusting attribute level weights, DECIDER offers explanatory insights into failures.
Figure 3: Results on failure detection across different benchmarks - (a) CIFAR100, and image corruptions on CIFAR-100-C, and (b) subpopulation shifts from spurious correlations on Waterbirds, CelebA datasets, and class imbalance on Cats vs Dogs. DECIDER consistently outperforms baselines in terms of the overall Matthew's Correlation Coefficient (MCC) as well as achieving higher failure and success recalls.
Figure 4: DECIDER produces the best performance on covariate shifts.. (left) Comparison of DECIDER against the best baseline in terms of the difference in MCC on the PACS dataset involving covariate shifts across 4 different visual domains. (Right) Improvement in failure recall performance of the best performing baseline and DECIDER on large-scale covariate shift benchmarks- DomainNet (DNet) and ImageNet-Sketch. The classifiers and PIMs are trained on DomainNet Real and Imagenet train sets respectively and evaluated on the different distribution shift datasets.
Figure 5: Failure Explanations. We explain the failures of the biased classifier $\mathcal{F}$, by manipulating the influence of individual attributes in PIM, such that the prediction probabilities of PIM match that of $\mathcal{F}$. The knowledge of the attributes whose influence was needed to be reduced provides an indication that $\mathcal{F}$ has not focused on those attributes to make its decisions. We show qualitative examples on Water birds in top left, Cats vs dogs in top right and from CelebA dataset in bottom.
...and 1 more figures

DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation

TL;DR

Abstract

DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)