Table of Contents
Fetching ...

Conformal Risk Minimization with Variance Reduction

Sima Noorani, Orlando Romero, Nicolo Dal Fabbro, Hamed Hassani, George J. Pappas

TL;DR

This paper addresses conformal risk minimization (CRM) by training models to produce efficient conformal prediction (CP) sets with guaranteed coverage. It analyzes ConfTr, reveals that its gradient estimator suffers from non-vanishing variance due to the population quantile gradient estimation, and proposes VR-ConfTr, which decouples tau(θ) estimation from gradient computation and uses a plug-in, variance-reduced estimator based on an epsilon-thresholded average of conformity-score gradients. The authors prove bias-variance trade-offs and provide practical guidelines (e.g., m-ranking) to tune the estimator, showing provable sample efficiency. Empirically, VR-ConfTr achieves faster convergence and consistently smaller CP prediction sets across multiple benchmarks (MNIST, Fashion-MNIST, KMNIST, OrganAMNIST, CIFAR-10) with comparable accuracy, highlighting its potential to improve CP-based uncertainty quantification in real-world tasks.

Abstract

Conformal prediction (CP) is a distribution-free framework for achieving probabilistic guarantees on black-box models. CP is generally applied to a model post-training. Recent research efforts, on the other hand, have focused on optimizing CP efficiency during training. We formalize this concept as the problem of conformal risk minimization (CRM). In this direction, conformal training (ConfTr) by Stutz et al.(2022) is a technique that seeks to minimize the expected prediction set size of a model by simulating CP in-between training updates. Despite its potential, we identify a strong source of sample inefficiency in ConfTr that leads to overly noisy estimated gradients, introducing training instability and limiting practical use. To address this challenge, we propose variance-reduced conformal training (VR-ConfTr), a CRM method that incorporates a variance reduction technique in the gradient estimation of the ConfTr objective function. Through extensive experiments on various benchmark datasets, we demonstrate that VR-ConfTr consistently achieves faster convergence and smaller prediction sets compared to baselines.

Conformal Risk Minimization with Variance Reduction

TL;DR

This paper addresses conformal risk minimization (CRM) by training models to produce efficient conformal prediction (CP) sets with guaranteed coverage. It analyzes ConfTr, reveals that its gradient estimator suffers from non-vanishing variance due to the population quantile gradient estimation, and proposes VR-ConfTr, which decouples tau(θ) estimation from gradient computation and uses a plug-in, variance-reduced estimator based on an epsilon-thresholded average of conformity-score gradients. The authors prove bias-variance trade-offs and provide practical guidelines (e.g., m-ranking) to tune the estimator, showing provable sample efficiency. Empirically, VR-ConfTr achieves faster convergence and consistently smaller CP prediction sets across multiple benchmarks (MNIST, Fashion-MNIST, KMNIST, OrganAMNIST, CIFAR-10) with comparable accuracy, highlighting its potential to improve CP-based uncertainty quantification in real-world tasks.

Abstract

Conformal prediction (CP) is a distribution-free framework for achieving probabilistic guarantees on black-box models. CP is generally applied to a model post-training. Recent research efforts, on the other hand, have focused on optimizing CP efficiency during training. We formalize this concept as the problem of conformal risk minimization (CRM). In this direction, conformal training (ConfTr) by Stutz et al.(2022) is a technique that seeks to minimize the expected prediction set size of a model by simulating CP in-between training updates. Despite its potential, we identify a strong source of sample inefficiency in ConfTr that leads to overly noisy estimated gradients, introducing training instability and limiting practical use. To address this challenge, we propose variance-reduced conformal training (VR-ConfTr), a CRM method that incorporates a variance reduction technique in the gradient estimation of the ConfTr objective function. Through extensive experiments on various benchmark datasets, we demonstrate that VR-ConfTr consistently achieves faster convergence and smaller prediction sets compared to baselines.

Paper Structure

This paper contains 36 sections, 9 theorems, 90 equations, 19 figures, 5 tables, 1 algorithm.

Key Result

Proposition 3.1

$E_{(1)}(\theta), \ldots, E_{(n)}(\theta)$ are almost surely (a.s.) everywhere differentiable in $\theta$. In particular, the empirical quantile $\hat{\tau}_n(\theta) = E_{(\lceil\alpha n\rceil)}(\theta)$ is a.s. everywhere differentiable.

Figures (19)

  • Figure 1: In this figure, we illustrate the VR-ConfTr pipeline and position it with respect to a typical CP procedure.
  • Figure 2: Learning curves for MNIST, Fashion-MNIST, Kuzushiji-MNIST, and OrganAMNIST. Each row shows training loss (left) and test CP set sizes (right) for the corresponding dataset, evaluated using the THR conformal predictor.
  • Figure 3: Learning curves for CIFAR-10 illustrating the fine-tuning process of a linear layer on a pretrained ResNet20 model using ConfTr and VR-ConfTr. Test CP set sizes are evaluated using the THR conformal predictor, consistent with the other datasets.
  • Figure 4: Sample batch from the GMM distribution (left); bias and variance for the quantile gradient estimates, comparing ConfTr and VR-ConfTr on the GMM dataset (right).
  • Figure 5: Learning curves for MNIST, Fashion-MNIST, Kuzushiji-MNIST, and OrganAMNIST. For each dataset, we show the test loss on the first row and tets accuracy on the bottom row at the end of each epoch.
  • ...and 14 more figures

Theorems & Definitions (13)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Theorem 3.4
  • Theorem 4.1
  • Proposition A.1
  • proof
  • Lemma A.2
  • proof
  • Proposition A.3
  • ...and 3 more