Table of Contents
Fetching ...

Beyond Data Splitting: Full-Data Conformal Prediction by Differential Privacy

Young Hyun Cho, Jordan Awan

TL;DR

A full-data privacy-preserving conformal prediction framework that avoids splitting is proposed, which leverages stability induced by differential privacy to control the gap between in-sample and out-of-sample conformal scores, and pairs this with a conservative private quantile routine designed to prevent under-coverage.

Abstract

Privacy protection and uncertainty quantification are increasingly important in data-driven decision making. Conformal prediction provides finite-sample marginal coverage, but existing private approaches often rely on data splitting, reducing the effective sample size. We propose a full-data privacy-preserving conformal prediction framework that avoids splitting. Our framework leverages stability induced by differential privacy to control the gap between in-sample and out-of-sample conformal scores, and pairs this with a conservative private quantile routine designed to prevent under-coverage. We show that a generic differential privacy guarantee yields a universal coverage floor, yet cannot generally recover the nominal $1-α$ level. We then provide a refined, mechanism-specific stability analysis and yields asymptotic recovery of the nominal level. Experiments demonstrate sharper prediction sets than the split-based private baseline.

Beyond Data Splitting: Full-Data Conformal Prediction by Differential Privacy

TL;DR

A full-data privacy-preserving conformal prediction framework that avoids splitting is proposed, which leverages stability induced by differential privacy to control the gap between in-sample and out-of-sample conformal scores, and pairs this with a conservative private quantile routine designed to prevent under-coverage.

Abstract

Privacy protection and uncertainty quantification are increasingly important in data-driven decision making. Conformal prediction provides finite-sample marginal coverage, but existing private approaches often rely on data splitting, reducing the effective sample size. We propose a full-data privacy-preserving conformal prediction framework that avoids splitting. Our framework leverages stability induced by differential privacy to control the gap between in-sample and out-of-sample conformal scores, and pairs this with a conservative private quantile routine designed to prevent under-coverage. We show that a generic differential privacy guarantee yields a universal coverage floor, yet cannot generally recover the nominal level. We then provide a refined, mechanism-specific stability analysis and yields asymptotic recovery of the nominal level. Experiments demonstrate sharper prediction sets than the split-based private baseline.
Paper Structure (50 sections, 12 theorems, 190 equations, 2 figures, 11 tables, 4 algorithms)

This paper contains 50 sections, 12 theorems, 190 equations, 2 figures, 11 tables, 4 algorithms.

Key Result

Proposition 1

Let $\{S_i\}_{i=1}^{k}$ be exchangeable scores, and let $\hat{q}$ be the $\lceil(1-\alpha)k\rceil$-th order statistics. Then, for any $S_i$, we have $\mathbb{P}(S_i \le \hat{q}) \ge 1 - \alpha.$

Figures (2)

  • Figure 1: Conceptual illustration of the distributional shift. The top row represents the ideal "exchangeable" world where $\theta_{n+1}$ is trained on all data points including the test point. The bottom row represents the actual "non-exchangeable" world where $\theta_n$ is used; the test score $S_{n+1}^{(n)}$ (red) is an out-of-sample evaluation. DP acts as a stabiliser, bounding the distance between $\theta_{n+1}$ and $\theta_n$, thereby ensuring that the red box remains distributionally close to the blue box.
  • Figure S1: Trajectory stability vs. estimation error under synchronized coupling. Each panel corresponds to $\epsilon\in\{0.5,1,2\}$ (with $\delta=10^{-5}$), reporting the mean over $R=30$ runs with a shaded uncertainty band.

Theorems & Definitions (33)

  • Definition 1: $f$-DP dong2022gaussian
  • Definition 2: Exchangeability
  • Proposition 1: vovk2005algorithmic
  • Lemma 1: One-sided conservativeness of Algorithm \ref{['alg:dp_binary_search']}
  • Lemma 2: Privacy of Buffered Binary Search
  • Theorem 1: Overall Privacy Guarantee
  • Theorem 2
  • Remark 1: Proof sketch and insight
  • Corollary 1: Black-box $f$-DP floor for DP-SCP
  • Example 1: Regression example
  • ...and 23 more