Table of Contents
Fetching ...

Conformal Prediction via Regression-as-Classification

Etash Guha, Shlok Natarajan, Thomas Möllenhoff, Mohammad Emtiyaz Khan, Eugene Ndiaye

TL;DR

This work extends conformal prediction to challenging regression settings by reframing regression as classification through binning of the output into $K$ bins and applying CP for classification. A novel ordinal-aware loss with an entropy regularizer encourages probability mass on neighboring bins while avoiding overconfidence, yielding a linearly interpolated density $\bar{q}_\theta(y|x)$ used to form regression CP sets. The approach, named Regression-to-Classification CP (R2CCP), maintains finite-sample coverage and often produces shorter prediction intervals than diverse CP baselines, particularly in heteroscedastic and bimodal scenarios. Empirical results on synthetic and real datasets demonstrate robust interval efficiency and adaptability to complex label distributions, with practical code available via "r2ccp".

Abstract

Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals.~Here, we circumvent the challenges by converting regression to a classification problem and then use CP for classification to obtain CP sets for regression.~To preserve the ordering of the continuous-output space, we design a new loss function and make necessary modifications to the CP classification techniques.~Empirical results on many benchmarks shows that this simple approach gives surprisingly good results on many practical problems.

Conformal Prediction via Regression-as-Classification

TL;DR

This work extends conformal prediction to challenging regression settings by reframing regression as classification through binning of the output into bins and applying CP for classification. A novel ordinal-aware loss with an entropy regularizer encourages probability mass on neighboring bins while avoiding overconfidence, yielding a linearly interpolated density used to form regression CP sets. The approach, named Regression-to-Classification CP (R2CCP), maintains finite-sample coverage and often produces shorter prediction intervals than diverse CP baselines, particularly in heteroscedastic and bimodal scenarios. Empirical results on synthetic and real datasets demonstrate robust interval efficiency and adaptability to complex label distributions, with practical code available via "r2ccp".

Abstract

Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals.~Here, we circumvent the challenges by converting regression to a classification problem and then use CP for classification to obtain CP sets for regression.~To preserve the ordering of the continuous-output space, we design a new loss function and make necessary modifications to the CP classification techniques.~Empirical results on many benchmarks shows that this simple approach gives surprisingly good results on many practical problems.
Paper Structure (31 sections, 19 equations, 23 figures, 5 tables, 1 algorithm)

This paper contains 31 sections, 19 equations, 23 figures, 5 tables, 1 algorithm.

Figures (23)

  • Figure 1: We show two examples where the output distribution is heteroskedastic (left) and bimodal (right). In both cases, our method is able to change the interval (shaded gray region) adaptively as the input values $x$ are increased. Examples outside the gray regions (white dots) are deemed different from those inside it (black dots).
  • Figure 1: This is the length results over all datasets. We see that our method achieves the best length on $10$ of the $16$ datasets. Meanwhile, CQR is best at $5$, CHR is best at $3$, CB is best at $1$, and KDE is the best at $3$. Our method achieves the shortest intervals across these datasets.
  • Figure 2: The resulting density estimates with different loss functions. We see that removing entropy from our loss function or using MLE as error terms causes sharp density estimates. Moreover, adding in the entropy regularization with MLE does not smooth the density estimate but instead raises the entire distribution uniformly; which does not provide valuable information for CP.
  • Figure 2: We present the length results over all of the variant loss functions. We find that our loss function delivers the best over $12$ datasets, demonstrating that our chosen loss function often generates the best intervals. For datasets where our method does not deliver the best results, it is likely that tuning the weight on the entropy $\tau$ and the smoothing term $p$ would likely have improved the results, but we do not do this for the sake of evaluation.
  • Figure 3: We present an ablation of how the number of bins affects the average length of the generated intervals.
  • ...and 18 more figures