Table of Contents
Fetching ...

RAT: Boosting Misclassification Detection Ability without Extra Data

Ge Yan, Tsui-Wei Weng

TL;DR

This work tackles misclassification detection for image classifiers by adopting robust radius, the input-space margin to the decision boundary, as a confidence score. It introduces two efficient radius estimators, RR-BS and RR-Fast, to enable fast and accurate MisD, and a data-efficient Radius Aware Training (RAT) method that improves detection without requiring extra data by selectively applying adversarial objectives to correct vs. misclassified examples. Empirical results show substantial improvements over baselines, including up to 29.3% AURC reduction and 21.62% FPR@95TPR reduction, and robustness to input corruptions such as CIFAR10-C. The combination of RR-based confidence and RAT yields a practical MisD framework with favorable speed-accuracy trade-offs and resilience across datasets and architectures.

Abstract

As deep neural networks(DNN) become increasingly prevalent, particularly in high-stakes areas such as autonomous driving and healthcare, the ability to detect incorrect predictions of models and intervene accordingly becomes crucial for safety. In this work, we investigate the detection of misclassified inputs for image classification models from the lens of adversarial perturbation: we propose to use robust radius (a.k.a. input-space margin) as a confidence metric and design two efficient estimation algorithms, RR-BS and RR-Fast, for misclassification detection. Furthermore, we design a training method called Radius Aware Training (RAT) to boost models' ability to identify mistakes. Extensive experiments show our method could achieve up to 29.3% reduction on AURC and 21.62% reduction in FPR@95TPR, compared with previous methods.

RAT: Boosting Misclassification Detection Ability without Extra Data

TL;DR

This work tackles misclassification detection for image classifiers by adopting robust radius, the input-space margin to the decision boundary, as a confidence score. It introduces two efficient radius estimators, RR-BS and RR-Fast, to enable fast and accurate MisD, and a data-efficient Radius Aware Training (RAT) method that improves detection without requiring extra data by selectively applying adversarial objectives to correct vs. misclassified examples. Empirical results show substantial improvements over baselines, including up to 29.3% AURC reduction and 21.62% FPR@95TPR reduction, and robustness to input corruptions such as CIFAR10-C. The combination of RR-based confidence and RAT yields a practical MisD framework with favorable speed-accuracy trade-offs and resilience across datasets and architectures.

Abstract

As deep neural networks(DNN) become increasingly prevalent, particularly in high-stakes areas such as autonomous driving and healthcare, the ability to detect incorrect predictions of models and intervene accordingly becomes crucial for safety. In this work, we investigate the detection of misclassified inputs for image classification models from the lens of adversarial perturbation: we propose to use robust radius (a.k.a. input-space margin) as a confidence metric and design two efficient estimation algorithms, RR-BS and RR-Fast, for misclassification detection. Furthermore, we design a training method called Radius Aware Training (RAT) to boost models' ability to identify mistakes. Extensive experiments show our method could achieve up to 29.3% reduction on AURC and 21.62% reduction in FPR@95TPR, compared with previous methods.

Paper Structure

This paper contains 37 sections, 6 equations, 6 figures, 7 tables, 2 algorithms.

Figures (6)

  • Figure 1: Overview of our method: (Left) In Radius Aware Training, our goal is to make misclassified inputs closer to the boundary, while correct ones further. This helps model distinguish correct and wrong examples. (Right) During inference, we calculate robust radius of each input and use it as a confidence score for detecting potential misclassified inputs.
  • Figure 2: Distribution of robust radius for correctly and wrongly classified examples for different network architectures. For WideResNetzagoruyko2016wrn and DenseNethuang2017densely we use the CIFAR10 validation dataset. For ViTdosovitskiy2020vit we use the ImageNet validation set. As shown in the figure, for all three architectures, the distribution of robust radius between correctly classified inputs and misclassified inputs clearly differ: misclassified sample (in red color) generally have smaller robust radius.
  • Figure 3: AURC (lower is better) vs. temperature $T$ on CIFAR10 and CIFAR100 datasets: This figure shows the sensitivity of different methods to the hyperparameter $T$. The orange lines denote our methods and the black lines denote two baselines, ODIN and DOCTOR. Compared with the baselines, our methods are less sensitive to the choice of $T$.
  • Figure 4: AUROC(Top, higher is better) and FPR95 (Bottom, lower is better) of different corruption types over CIFAR10-C dataset. As shown in the figure, our method outperforms the baselines over all different corruptions type, showing that our method is still valid under noisy inputs.
  • Figure B.1: AUROC of different methods on high confidence region: This figure shows how the performance of different methods changes when we focus on high confidence predictions. The orange line denotes our methods and the black lines denote the baselines. As shown in the figure, (1) all methods suffer different extents of performance decrease for high-confidence samples, (2) the performance of our method drops less in high-confidence region, suggesting that it has a stronger ability to detect overconfident misclassified examples.
  • ...and 1 more figures