RAT: Boosting Misclassification Detection Ability without Extra Data
Ge Yan, Tsui-Wei Weng
TL;DR
This work tackles misclassification detection for image classifiers by adopting robust radius, the input-space margin to the decision boundary, as a confidence score. It introduces two efficient radius estimators, RR-BS and RR-Fast, to enable fast and accurate MisD, and a data-efficient Radius Aware Training (RAT) method that improves detection without requiring extra data by selectively applying adversarial objectives to correct vs. misclassified examples. Empirical results show substantial improvements over baselines, including up to 29.3% AURC reduction and 21.62% FPR@95TPR reduction, and robustness to input corruptions such as CIFAR10-C. The combination of RR-based confidence and RAT yields a practical MisD framework with favorable speed-accuracy trade-offs and resilience across datasets and architectures.
Abstract
As deep neural networks(DNN) become increasingly prevalent, particularly in high-stakes areas such as autonomous driving and healthcare, the ability to detect incorrect predictions of models and intervene accordingly becomes crucial for safety. In this work, we investigate the detection of misclassified inputs for image classification models from the lens of adversarial perturbation: we propose to use robust radius (a.k.a. input-space margin) as a confidence metric and design two efficient estimation algorithms, RR-BS and RR-Fast, for misclassification detection. Furthermore, we design a training method called Radius Aware Training (RAT) to boost models' ability to identify mistakes. Extensive experiments show our method could achieve up to 29.3% reduction on AURC and 21.62% reduction in FPR@95TPR, compared with previous methods.
