Table of Contents
Fetching ...

The Penalized Inverse Probability Measure for Conformal Classification

Paul Melki, Lionel Bombrun, Boubacar Diallo, Jérôme Dias, Jean-Pierre da Costa

TL;DR

The current work introduces the Penalized Inverse Probability (PIP) nonconformity score, and its regularized version RePIP, that allow the joint optimization of both efficiency and informativeness.

Abstract

The deployment of safe and trustworthy machine learning systems, and particularly complex black box neural networks, in real-world applications requires reliable and certified guarantees on their performance. The conformal prediction framework offers such formal guarantees by transforming any point into a set predictor with valid, finite-set, guarantees on the coverage of the true at a chosen level of confidence. Central to this methodology is the notion of the nonconformity score function that assigns to each example a measure of ''strangeness'' in comparison with the previously seen observations. While the coverage guarantees are maintained regardless of the nonconformity measure, the point predictor and the dataset, previous research has shown that the performance of a conformal model, as measured by its efficiency (the average size of the predicted sets) and its informativeness (the proportion of prediction sets that are singletons), is influenced by the choice of the nonconformity score function. The current work introduces the Penalized Inverse Probability (PIP) nonconformity score, and its regularized version RePIP, that allow the joint optimization of both efficiency and informativeness. Through toy examples and empirical results on the task of crop and weed image classification in agricultural robotics, the current work shows how PIP-based conformal classifiers exhibit precisely the desired behavior in comparison with other nonconformity measures and strike a good balance between informativeness and efficiency.

The Penalized Inverse Probability Measure for Conformal Classification

TL;DR

The current work introduces the Penalized Inverse Probability (PIP) nonconformity score, and its regularized version RePIP, that allow the joint optimization of both efficiency and informativeness.

Abstract

The deployment of safe and trustworthy machine learning systems, and particularly complex black box neural networks, in real-world applications requires reliable and certified guarantees on their performance. The conformal prediction framework offers such formal guarantees by transforming any point into a set predictor with valid, finite-set, guarantees on the coverage of the true at a chosen level of confidence. Central to this methodology is the notion of the nonconformity score function that assigns to each example a measure of ''strangeness'' in comparison with the previously seen observations. While the coverage guarantees are maintained regardless of the nonconformity measure, the point predictor and the dataset, previous research has shown that the performance of a conformal model, as measured by its efficiency (the average size of the predicted sets) and its informativeness (the proportion of prediction sets that are singletons), is influenced by the choice of the nonconformity score function. The current work introduces the Penalized Inverse Probability (PIP) nonconformity score, and its regularized version RePIP, that allow the joint optimization of both efficiency and informativeness. Through toy examples and empirical results on the task of crop and weed image classification in agricultural robotics, the current work shows how PIP-based conformal classifiers exhibit precisely the desired behavior in comparison with other nonconformity measures and strike a good balance between informativeness and efficiency.
Paper Structure (11 sections, 7 equations, 4 figures, 1 table)

This paper contains 11 sections, 7 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Six different potential configurations of model outputs sorted in decreasing order of $\hat{p}$. Only the classes until reaching the class of interest $y$ are shown. Computed nonconformity scores for each case can be seen in Table \ref{['tab:scores_examples']}.
  • Figure 2: Some randomly chosen example images of 6 different classes. Common buckwheat and rye brome are weeds, while corn, pea and sunflower are cultivated species.
  • Figure 3: Efficiency and Informativeness for different values of the regularization hyperparameters. For each value of $\gamma$ and $\lambda$, 100 different splits of the calibration and test sets are considered for more reliable results.
  • Figure 4: Violin plots of experimental results on 1000 random splits of the WE3DS classification dataset (each point is a random split): (a) Empirical Coverage -- (b) Efficiency (Mean Set Size) -- (c) Informativeness (Proportion of Predicted Singletons).