Onboard Out-of-Calibration Detection of Deep Learning Models using Conformal Prediction

Protim Bhattacharjee; Peter Jung

Onboard Out-of-Calibration Detection of Deep Learning Models using Conformal Prediction

Protim Bhattacharjee, Peter Jung

TL;DR

This work addresses trusted DL deployment in remote sensing by leveraging conformal prediction (CP) to obtain prediction sets with coverage guarantees $P(Y_{test}\in \mathcal{C}(X_{test})) \ge 1 - \epsilon$ under data exchangeability. The authors relate CP uncertainty to ordinary model uncertainty via normalized softmax entropy and propose using the average conformal prediction set size as a practical, onboard detector for out-of-calibration under sensor noise. Through experiments on EuroSAT with ResNet50, InceptionV3, DenseNet161, and MobileNetV2 under AWGN, shot, and impulse noise, they show that uncertain models exhibit increasing CP set sizes as noise grows, while overconfident models do not provide reliable detection. The work suggests deploying uncertain yet well-calibrated networks for onboard health monitoring and discusses extending CP with other uncertainty estimators and noise/intrinsic factors, with code to be released in MAPIE-based implementation.

Abstract

The black box nature of deep learning models complicate their usage in critical applications such as remote sensing. Conformal prediction is a method to ensure trust in such scenarios. Subject to data exchangeability, conformal prediction provides finite sample coverage guarantees in the form of a prediction set that is guaranteed to contain the true class within a user defined error rate. In this letter we show that conformal prediction algorithms are related to the uncertainty of the deep learning model and that this relation can be used to detect if the deep learning model is out-of-calibration. Popular classification models like Resnet50, Densenet161, InceptionV3, and MobileNetV2 are applied on remote sensing datasets such as the EuroSAT to demonstrate how under noisy scenarios the model outputs become untrustworthy. Furthermore an out-of-calibration detection procedure relating the model uncertainty and the average size of the conformal prediction set is presented.

Onboard Out-of-Calibration Detection of Deep Learning Models using Conformal Prediction

TL;DR

This work addresses trusted DL deployment in remote sensing by leveraging conformal prediction (CP) to obtain prediction sets with coverage guarantees

under data exchangeability. The authors relate CP uncertainty to ordinary model uncertainty via normalized softmax entropy and propose using the average conformal prediction set size as a practical, onboard detector for out-of-calibration under sensor noise. Through experiments on EuroSAT with ResNet50, InceptionV3, DenseNet161, and MobileNetV2 under AWGN, shot, and impulse noise, they show that uncertain models exhibit increasing CP set sizes as noise grows, while overconfident models do not provide reliable detection. The work suggests deploying uncertain yet well-calibrated networks for onboard health monitoring and discusses extending CP with other uncertainty estimators and noise/intrinsic factors, with code to be released in MAPIE-based implementation.

Abstract

Paper Structure (5 sections, 6 equations, 3 figures, 1 table)

This paper contains 5 sections, 6 equations, 3 figures, 1 table.

Introduction
Conformal prediction
Conformal prediction and uncertainty
Results
Conclusion and future work

Figures (3)

Figure 1: Example of conformal prediction using APS aps method with $\epsilon = 0.1$ and Resnet50 for EuroSAT eurosat2. For each image the prediction set along with the softmax output probability is provided. The sum of the softmax values in the prediction sets exceed $1 - \epsilon$. The true class is present in all the prediction sets. For the middle and right panel the point prediction would be incorrect, instead conformal prediction generates a set that contains the true class with probability $1 - \epsilon$.
Figure 2: Average normalized softmax entropy for different noise and models. (a) AWGN, (b) Shot noise, and (c) Impulse noise. The y-axis of each is the normalized softmax entropy and the x-axis is the noise severity as defined in imagenetc.
Figure 3: Largest softmax histograms for different models and noise, (a) AWGN, (b) shot noise, and (c) impulse noise. In each subfigure Top Panel: ResNet50, Middle Panel: InceptionV3, and Bottom Panel: MobileNetV2. Left Column: clean validation data, Middle Column: severity = 1, and Right Column: severity = 2. The red dashed line is $Q_{1- 0.05}$, dash-dot is $Q_{1- 0.5}$, and dotted is $Q_{1- 0.2}$ for APS. The black dotted line represents the average value of largest softmax.

Onboard Out-of-Calibration Detection of Deep Learning Models using Conformal Prediction

TL;DR

Abstract

Onboard Out-of-Calibration Detection of Deep Learning Models using Conformal Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (3)