Table of Contents
Fetching ...

Conformal Thresholded Intervals for Efficient Regression

Rui Luo, Zhixin Zhou

TL;DR

The paper addresses uncertainty quantification in regression by constructing small, valid prediction sets with marginal coverage at level $1-\alpha$ without estimating the full conditional distribution. It introduces Conformal Thresholded Interquantile Intervals (CTI), which threshold interquantile intervals derived from multi-output quantile regression (e.g., quantile regression forests or networks) to form prediction sets, calibrated via a calibration set. The approach leverages the inverse relationship between interval length and density to approximate the Neyman-Pearson optimal density-thresholded predictor, yielding efficiency gains. Empirical results on simulated and real datasets show CTI consistently achieves smaller, valid prediction sets compared to state-of-the-art conformal methods, while remaining computationally efficient and easy to implement.

Abstract

This paper introduces Conformal Thresholded Intervals (CTI), a novel conformal regression method that aims to produce the smallest possible prediction set with guaranteed coverage. Unlike existing methods that rely on nested conformal frameworks and full conditional distribution estimation, CTI estimates the conditional probability density for a new response to fall into each interquantile interval using off-the-shelf multi-output quantile regression. By leveraging the inverse relationship between interval length and probability density, CTI constructs prediction sets by thresholding the estimated conditional interquantile intervals based on their length. The optimal threshold is determined using a calibration set to ensure marginal coverage, effectively balancing the trade-off between prediction set size and coverage. CTI's approach is computationally efficient and avoids the complexity of estimating the full conditional distribution. The method is theoretically grounded, with provable guarantees for marginal coverage and achieving the smallest prediction size given by Neyman-Pearson . Extensive experimental results demonstrate that CTI achieves superior performance compared to state-of-the-art conformal regression methods across various datasets, consistently producing smaller prediction sets while maintaining the desired coverage level. The proposed method offers a simple yet effective solution for reliable uncertainty quantification in regression tasks, making it an attractive choice for practitioners seeking accurate and efficient conformal prediction.

Conformal Thresholded Intervals for Efficient Regression

TL;DR

The paper addresses uncertainty quantification in regression by constructing small, valid prediction sets with marginal coverage at level without estimating the full conditional distribution. It introduces Conformal Thresholded Interquantile Intervals (CTI), which threshold interquantile intervals derived from multi-output quantile regression (e.g., quantile regression forests or networks) to form prediction sets, calibrated via a calibration set. The approach leverages the inverse relationship between interval length and density to approximate the Neyman-Pearson optimal density-thresholded predictor, yielding efficiency gains. Empirical results on simulated and real datasets show CTI consistently achieves smaller, valid prediction sets compared to state-of-the-art conformal methods, while remaining computationally efficient and easy to implement.

Abstract

This paper introduces Conformal Thresholded Intervals (CTI), a novel conformal regression method that aims to produce the smallest possible prediction set with guaranteed coverage. Unlike existing methods that rely on nested conformal frameworks and full conditional distribution estimation, CTI estimates the conditional probability density for a new response to fall into each interquantile interval using off-the-shelf multi-output quantile regression. By leveraging the inverse relationship between interval length and probability density, CTI constructs prediction sets by thresholding the estimated conditional interquantile intervals based on their length. The optimal threshold is determined using a calibration set to ensure marginal coverage, effectively balancing the trade-off between prediction set size and coverage. CTI's approach is computationally efficient and avoids the complexity of estimating the full conditional distribution. The method is theoretically grounded, with provable guarantees for marginal coverage and achieving the smallest prediction size given by Neyman-Pearson . Extensive experimental results demonstrate that CTI achieves superior performance compared to state-of-the-art conformal regression methods across various datasets, consistently producing smaller prediction sets while maintaining the desired coverage level. The proposed method offers a simple yet effective solution for reliable uncertainty quantification in regression tasks, making it an attractive choice for practitioners seeking accurate and efficient conformal prediction.
Paper Structure (19 sections, 6 theorems, 50 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 6 theorems, 50 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Let $f$ and $g$ be two nonnegative measurable functions. Then the optimizer of the problem is given by $C = \{x: f(x)/g(x) \geq t'\}$ if there exists $t$ such that $\int_{f/g \geq t'} f = 1 - \alpha$.

Figures (3)

  • Figure 1: Quantile regression results for the synthetic data. (a) Quantile regression forest and (b) quantile regression neural network show the estimated and theoretical 5%, 50%, and 95% quantiles as well as the 90% intervals.
  • Figure 2: Prediction set sizes for the synthetic data at $\alpha=0.1$. (a) Theoretical prediction set for different conformal methods. The expected set sizes for CQR, CHR, and CTI are 0.376, 0.342, and 0.317, respectively. (b) Prediction set sizes as a function of $x$ using the estimated quantile functions (RF for random forests and NN for neural network). CTI achieves the smallest set size while maintaining guaranteed coverage.
  • Figure 3: Comparison of interval lengths across datasets: Each subfigure shows the distribution of interval lengths for a specific dataset. The blue histogram represents the intervals containing the actual responses, while the red histogram shows all intervals from the multi-output quantile regression model on the test set.

Theorems & Definitions (10)

  • Remark
  • Lemma 1: Neyman-Pearson
  • Theorem 1: Coverage Probability
  • Proposition 1: Threshold Consistency
  • Theorem 2: Prediction Set
  • Remark : Comparison with existing methods
  • Lemma 2
  • proof
  • Lemma 3
  • proof