Table of Contents
Fetching ...

Conformal Loss-Controlling Prediction

Di Wang, Ping Wang, Zhong Ji, Xiaojun Yang, Hongyue Li

TL;DR

This work addresses the need to control not just the coverage of prediction sets but the value of a general loss L on test objects. It introduces conformal loss-controlling prediction (CLCP), which selects a nesting parameter λ* to ensure P( L(Y_{n+1}, C_{λ*}(X_{n+1})) ≤ α ) ≥ 1 − δ under exchangeability, generalizing both inductive conformal prediction and conformal risk control. Theoretical guarantees are established, and CP is shown as a special case of CLCP; empirical validation covers a class-varying loss in classification and postprocessed weather forecasting, demonstrating practical effectiveness and the impact of underlying models on predictive efficiency. The framework enables robust, finite-sample control of general losses across diverse domains, including medical imaging and numerical weather prediction, by leveraging calibration data and nested prediction sets. Overall, CLCP provides a versatile, theoretically grounded approach to loss-controlled predictions with tangible applications and clear avenues for improving informational efficiency through algorithm design.

Abstract

Conformal prediction is a learning framework controlling prediction coverage of prediction sets, which can be built on any learning algorithm for point prediction. This work proposes a learning framework named conformal loss-controlling prediction, which extends conformal prediction to the situation where the value of a loss function needs to be controlled. Different from existing works about risk-controlling prediction sets and conformal risk control with the purpose of controlling the expected values of loss functions, the proposed approach in this paper focuses on the loss for any test object, which is an extension of conformal prediction from miscoverage loss to some general loss. The controlling guarantee is proved under the assumption of exchangeability of data in finite-sample cases and the framework is tested empirically for classification with a class-varying loss and statistical postprocessing of numerical weather forecasting applications, which are introduced as point-wise classification and point-wise regression problems. All theoretical analysis and experimental results confirm the effectiveness of our loss-controlling approach.

Conformal Loss-Controlling Prediction

TL;DR

This work addresses the need to control not just the coverage of prediction sets but the value of a general loss L on test objects. It introduces conformal loss-controlling prediction (CLCP), which selects a nesting parameter λ* to ensure P( L(Y_{n+1}, C_{λ*}(X_{n+1})) ≤ α ) ≥ 1 − δ under exchangeability, generalizing both inductive conformal prediction and conformal risk control. Theoretical guarantees are established, and CP is shown as a special case of CLCP; empirical validation covers a class-varying loss in classification and postprocessed weather forecasting, demonstrating practical effectiveness and the impact of underlying models on predictive efficiency. The framework enables robust, finite-sample control of general losses across diverse domains, including medical imaging and numerical weather prediction, by leveraging calibration data and nested prediction sets. Overall, CLCP provides a versatile, theoretically grounded approach to loss-controlled predictions with tangible applications and clear avenues for improving informational efficiency through algorithm design.

Abstract

Conformal prediction is a learning framework controlling prediction coverage of prediction sets, which can be built on any learning algorithm for point prediction. This work proposes a learning framework named conformal loss-controlling prediction, which extends conformal prediction to the situation where the value of a loss function needs to be controlled. Different from existing works about risk-controlling prediction sets and conformal risk control with the purpose of controlling the expected values of loss functions, the proposed approach in this paper focuses on the loss for any test object, which is an extension of conformal prediction from miscoverage loss to some general loss. The controlling guarantee is proved under the assumption of exchangeability of data in finite-sample cases and the framework is tested empirically for classification with a class-varying loss and statistical postprocessing of numerical weather forecasting applications, which are introduced as point-wise classification and point-wise regression problems. All theoretical analysis and experimental results confirm the effectiveness of our loss-controlling approach.
Paper Structure (12 sections, 1 theorem, 40 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 12 sections, 1 theorem, 40 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Suppose $\{(X_i, Y_i)\}_{i = 1}^{n+1}$ are $n+1$ data drawn exchangeably from $P_{XY}$ on $\mathcal{X} \times \mathcal{Y}$, $C_{\lambda}: \mathcal{X} \rightarrow \mathcal{Y}'$ is a set-valued function satisfying formula (1) with the parameter $\lambda$ taking values from a discrete set $\Lambda \sub then for any $\delta \in (\frac{1}{n+1},1)$, we have where $\lambda^*$ is defined as formula (5).

Figures (8)

  • Figure 1: Bar plots of the frequencies of the prediction losses being greater than $\alpha$ vs. $\delta = 0.05, 0.1, 0.15, 0.2$ on test data for classification with a class-varying loss. The first row corresponds to $\alpha = 0.1$ and the second row corresponds to $\alpha = 0.2$. Different columns represent different classifiers. All bars are near or below the preset $\delta$, which confirms the controlling guarantee of CLCP empirically.
  • Figure 2: Bar plots of the average sizes of prediction sets vs. $\delta = 0.05, 0.1, 0.15, 0.2$ on test data for classification with a class-varying loss. The first row corresponds to $\alpha = 0.1$ and the second row corresponds to $\alpha = 0.2$. Different columns represent different classifiers. The plots demonstrate the information in prediction sets. In general, large $\delta$ leads to small average size and different classifiers have different informational efficiency.
  • Figure 3: Bar plots of the frequencies of the prediction losses being greater than $\alpha$ vs. $\delta = 0.05, 0.1, 0.15, 0.2$ on test data for high-impact weather forecasting. The first row corresponds to HighTemp and the second row corresponds to LowTemp. Different columns represent different $\alpha$. All bars are near or below the preset $\delta$, which confirms the controlling guarantee of CLCP empirically.
  • Figure 4: Boxen plots of the prediction losses vs. $\delta = 0.05, 0.1, 0.15, 0.2$ on test data for high-impact weather forecasting. The first row corresponds to HighTemp and the second row corresponds to LowTemp. Different columns represent different $\alpha$. The loss distributions are controlled by $\alpha$ and $\delta$ properly to obtain the empirical validity in Fig. 3.
  • Figure 5: Boxen plots for the distributions of normalized sizes of prediction sets vs. $\delta = 0.05, 0.1, 0.15, 0.2$ on test data for high-impact weather forecasting. The first row corresponds to HighTemp and the second row corresponds to LowTemp. Different columns represent different $\alpha$. U-Net performs better than nDNN, which indicates the importance of careful design of the underlying algorithm.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Definition 1
  • Theorem 1
  • proof