Table of Contents
Fetching ...

Prediction Sets and Conformal Inference with Interval Outcomes

Weiguang Liu, Áureo de Paula, Elie Tamer

Abstract

Given data on a random variable \(Y\), a prediction set with miscoverage level \(α\in (0,1)\) is a set that contains a new draw of \(Y\) with probability \(1-α\). Among all prediction sets satisfying this coverage property, the oracle prediction set is the one with minimal volume. The oracle prediction set offers a complementary view of the distribution of \(Y\), beyond point estimators such as the mean and quantiles, and has attracted considerable interest recently. This paper develops methods for estimating such prediction sets conditional on observed covariates when \(Y\) is \textit{censored} or \textit{interval-valued}. We characterise the oracle prediction set under partial identification induced by interval censoring and propose consistent estimators for both oracle prediction intervals and more general oracle prediction sets consisting of multiple disjoint intervals. In addition, we apply conformal inference to construct finite-sample valid prediction sets for interval outcomes that remain consistent as the sample size grows, using a conformity score tailored to interval data. The proposed procedure accounts for irreducible prediction uncertainty due to the stochastic nature of outcomes, modelling uncertainty arising from partial identification, and sampling uncertainty that vanishes as sample size increases. We conduct Monte Carlo simulations and two empirical applications using UK job postings data and the US Current Population Survey. The results demonstrate the robustness and efficiency of the proposed methods.

Prediction Sets and Conformal Inference with Interval Outcomes

Abstract

Given data on a random variable , a prediction set with miscoverage level \(α\in (0,1)\) is a set that contains a new draw of with probability . Among all prediction sets satisfying this coverage property, the oracle prediction set is the one with minimal volume. The oracle prediction set offers a complementary view of the distribution of , beyond point estimators such as the mean and quantiles, and has attracted considerable interest recently. This paper develops methods for estimating such prediction sets conditional on observed covariates when is \textit{censored} or \textit{interval-valued}. We characterise the oracle prediction set under partial identification induced by interval censoring and propose consistent estimators for both oracle prediction intervals and more general oracle prediction sets consisting of multiple disjoint intervals. In addition, we apply conformal inference to construct finite-sample valid prediction sets for interval outcomes that remain consistent as the sample size grows, using a conformity score tailored to interval data. The proposed procedure accounts for irreducible prediction uncertainty due to the stochastic nature of outcomes, modelling uncertainty arising from partial identification, and sampling uncertainty that vanishes as sample size increases. We conduct Monte Carlo simulations and two empirical applications using UK job postings data and the US Current Population Survey. The results demonstrate the robustness and efficiency of the proposed methods.
Paper Structure (17 sections, 13 theorems, 69 equations, 5 figures, 3 tables)

This paper contains 17 sections, 13 theorems, 69 equations, 5 figures, 3 tables.

Key Result

Proposition 1

Suppose $Y$ is interval valued, for any $\lambda <\infty$ and any $y\in [a,b]$, where $a<b$ and $(a,b) \in \mathbb{R}^{2}$ is in the support of $(Y^{L}, Y^{U})$, there exists some $P'\in \mathcal{P}_{I}$, such that $y\in L'(\lambda)$, where $L'(\lambda)$ is the upper level set for the density $p'(y)

Figures (5)

  • Figure 1: Joint density of $Y^{L}$ and $Y^{U}$ when $Y\sim N(0,1)$, $\kappa^{L},\kappa^{U} \sim \text{Exp}(1)$. Integral of the density over the triangular region corresponds to $P(-0.75, 0.5;x)$.
  • Figure 2: Comparison of prediction sets in simulation study. Row 1-4 shows the observed samples $(X_{i}, Y^{L}_{i}, Y_{i}^{U})$ and the conformal prediction set $\tilde{C}$, the local conformal prediction set $\tilde{C}_{\text{loc}}$, the conformal prediction set constructed with quadratic quantile regression $\tilde{C}_{q2}$, and the conformal prediction set constructed with cubic quantile regression $\tilde{C}_{q3}$. Row 5 shows the integrated volume of the prediction sets, and row 6 shows the coverage of the prediction sets.
  • Figure 3: Job advert counts by category.
  • Figure 4: Local conformal prediction interval for annual salaries in the UK IT job market. The left panel shows the estimated lower bound, and the right panel shows the estimated upper bound of the 90% local conformal prediction interval for annual salaries across geographic locations in the UK
  • Figure 5: Proportion of individuals who reported income in a range by years of education.

Theorems & Definitions (24)

  • Definition 1: Validity under Partial Identification
  • Definition 2: Oracle prediction Set under Partial Identification
  • Proposition 1
  • Lemma 2: Theorem SIR-2.3 in molinari2020Chapter5
  • Lemma 3
  • Proposition 4: Feasible Estimation
  • Theorem 5
  • Example 1
  • Theorem 6
  • Remark 1
  • ...and 14 more