Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks

Mohit Prabhushankar; Ghassan AlRegib

Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks

Mohit Prabhushankar, Ghassan AlRegib

TL;DR

This work introduces GradTrust, a per-sample trust quantification for neural network predictions grounded in the variance of counterfactual gradients. By computing the Fisher information from counterfactual backpropagations of the top-$k$ competing classes, GradTrust yields a score $r_x$ that reflects the trustworthiness of a prediction. Across ImageNet and Kinetics-400, GradTrust consistently ranks among the top methods for misprediction detection, often rivaling or surpassing simple baselines like Negative Log-Likelihood and Margin while outperforming many uncertainty and OOD detectors. The approach demonstrates strong practical value for evaluating and communicating prediction reliability in large-scale vision models, with publicly available code at the GradTrust repository.

Abstract

The widespread adoption of deep neural networks in machine learning calls for an objective quantification of esoteric trust. In this paper we propose GradTrust, a classification trust measure for large-scale neural networks at inference. The proposed method utilizes variance of counterfactual gradients, i.e. the required changes in the network parameters if the label were different. We show that GradTrust is superior to existing techniques for detecting misprediction rates on $50000$ images from ImageNet validation dataset. Depending on the network, GradTrust detects images where either the ground truth is incorrect or ambiguous, or the classes are co-occurring. We extend GradTrust to Video Action Recognition on Kinetics-400 dataset. We showcase results on $14$ architectures pretrained on ImageNet and $5$ architectures pretrained on Kinetics-400. We observe the following: (i) simple methodologies like negative log likelihood and margin classifiers outperform state-of-the-art uncertainty and out-of-distribution detection techniques for misprediction rates, and (ii) the proposed GradTrust is in the Top-2 performing methods on $37$ of the considered $38$ experimental modalities. The code is available at: https://github.com/olivesgatech/GradTrust

Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks

TL;DR

competing classes, GradTrust yields a score

that reflects the trustworthiness of a prediction. Across ImageNet and Kinetics-400, GradTrust consistently ranks among the top methods for misprediction detection, often rivaling or surpassing simple baselines like Negative Log-Likelihood and Margin while outperforming many uncertainty and OOD detectors. The approach demonstrates strong practical value for evaluating and communicating prediction reliability in large-scale vision models, with publicly available code at the GradTrust repository.

Abstract

images from ImageNet validation dataset. Depending on the network, GradTrust detects images where either the ground truth is incorrect or ambiguous, or the classes are co-occurring. We extend GradTrust to Video Action Recognition on Kinetics-400 dataset. We showcase results on

architectures pretrained on ImageNet and

architectures pretrained on Kinetics-400. We observe the following: (i) simple methodologies like negative log likelihood and margin classifiers outperform state-of-the-art uncertainty and out-of-distribution detection techniques for misprediction rates, and (ii) the proposed GradTrust is in the Top-2 performing methods on

of the considered

experimental modalities. The code is available at: https://github.com/olivesgatech/GradTrust

Paper Structure (11 sections, 3 equations, 4 figures, 2 tables)

This paper contains 11 sections, 3 equations, 4 figures, 2 tables.

Introduction
Related Work
Methodology
GradTrust Quantification
Experiments
Evaluation
ImageNet Experiments
Qualitative Analysis
Video Classification Experiments
Limitations
Conclusion

Figures (4)

Figure 1: Scatter plot between the proposed GradTrust on x-axis and softmax confidence on y-axis on ImageNet validation dataset using ResNet-18. Green points indicate correctly classified data and red indicates misclassified data. Representative misclassified and correctly images in the numbered boxes are displayed alongside the scatterplot, with their predictions (in red) and labels (in blue).
Figure 2: Block diagram of GradTrust quantification.
Figure 3: Accuracy and F1 values of GradTrust and comparison metrics against corresponding percentile bins.
Figure 4: Qualitative analysis of mispredictions on AlexNet (top row), MaxVit-t (middle row) and ensemble mispredictions across all networks from Table \ref{['tab:ImageNet']} (bottom row). All displayed images have high softmax and ordered in ascending order of GradTrust.

Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks

TL;DR

Abstract

Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)