Table of Contents
Fetching ...

Private Prediction Sets

Anastasios N. Angelopoulos, Stephen Bates, Tijana Zrnic, Michael I. Jordan

TL;DR

The paper addresses the need for reliable uncertainty quantification alongside privacy in consequential decision systems by integrating differential privacy into conformal prediction to produce private prediction sets with finite-sample coverage guarantees. It introduces a DP conformal prediction algorithm that calibrates prediction-set thresholds using a privatized quantile computed via discretized bins and the exponential mechanism, ensuring $\epsilon$-DP and a coverage bound of $1-\alpha \le \mathbb{P}\{Y\in \mathcal{C}(X)\} \le 1-\alpha + O((n\epsilon)^{-1})$. The authors provide a formal algorithm, theoretical guarantees, and an analysis of the privacy-utility tradeoffs controlled by the number of bins and the privacy parameter, complemented by extensive experiments on CIFAR-10/ImageNet and a COVID-19 diagnosis task. They find that private calibration contributes modestly to performance, while the dominant privacy cost comes from private model training, suggesting that advances in private training will directly improve final private prediction sets. Overall, the work enables robust, private uncertainty quantification for arbitrary models and datasets, with practical implications for privacy-preserving decision support systems.

Abstract

In real-world settings involving consequential decision-making, the deployment of machine learning systems generally requires both reliable uncertainty quantification and protection of individuals' privacy. We present a framework that treats these two desiderata jointly. Our framework is based on conformal prediction, a methodology that augments predictive models to return prediction sets that provide uncertainty quantification -- they provably cover the true response with a user-specified probability, such as 90%. One might hope that when used with privately-trained models, conformal prediction would yield privacy guarantees for the resulting prediction sets; unfortunately, this is not the case. To remedy this key problem, we develop a method that takes any pre-trained predictive model and outputs differentially private prediction sets. Our method follows the general approach of split conformal prediction; we use holdout data to calibrate the size of the prediction sets but preserve privacy by using a privatized quantile subroutine. This subroutine compensates for the noise introduced to preserve privacy in order to guarantee correct coverage. We evaluate the method on large-scale computer vision datasets.

Private Prediction Sets

TL;DR

The paper addresses the need for reliable uncertainty quantification alongside privacy in consequential decision systems by integrating differential privacy into conformal prediction to produce private prediction sets with finite-sample coverage guarantees. It introduces a DP conformal prediction algorithm that calibrates prediction-set thresholds using a privatized quantile computed via discretized bins and the exponential mechanism, ensuring -DP and a coverage bound of . The authors provide a formal algorithm, theoretical guarantees, and an analysis of the privacy-utility tradeoffs controlled by the number of bins and the privacy parameter, complemented by extensive experiments on CIFAR-10/ImageNet and a COVID-19 diagnosis task. They find that private calibration contributes modestly to performance, while the dominant privacy cost comes from private model training, suggesting that advances in private training will directly improve final private prediction sets. Overall, the work enables robust, private uncertainty quantification for arbitrary models and datasets, with practical implications for privacy-preserving decision support systems.

Abstract

In real-world settings involving consequential decision-making, the deployment of machine learning systems generally requires both reliable uncertainty quantification and protection of individuals' privacy. We present a framework that treats these two desiderata jointly. Our framework is based on conformal prediction, a methodology that augments predictive models to return prediction sets that provide uncertainty quantification -- they provably cover the true response with a user-specified probability, such as 90%. One might hope that when used with privately-trained models, conformal prediction would yield privacy guarantees for the resulting prediction sets; unfortunately, this is not the case. To remedy this key problem, we develop a method that takes any pre-trained predictive model and outputs differentially private prediction sets. Our method follows the general approach of split conformal prediction; we use holdout data to calibrate the size of the prediction sets but preserve privacy by using a privatized quantile subroutine. This subroutine compensates for the noise introduced to preserve privacy in order to guarantee correct coverage. We evaluate the method on large-scale computer vision datasets.

Paper Structure

This paper contains 18 sections, 7 theorems, 32 equations, 6 figures, 5 algorithms.

Key Result

Theorem 1

The prediction set function $\mathcal{C}(\cdot)$ returned by Algorithm alg:main_algo is $\epsilon$-differentially private and satisfies

Figures (6)

  • Figure 1: Examples of private conformal prediction sets on COVID-19 data. We show three examples of lung X-rays taken from the CoronaHack dataset perez2020databiology with their corresponding private prediction sets at $\alpha=10\%$ from a ResNet-18. All three patients had viral pneumonia (likely COVID-19). The classes in the prediction sets appear in ranked order according to the softmax score of the model; the center and right images are incorrectly classified if the predictor returns only the most likely class, but are correctly covered by the private prediction sets. See Experiment \ref{['sec:experiment4']} for details.
  • Figure 2: The private quantile $\tilde{q}$ as $n$ and $\epsilon$ grow. We demonstrate the adjusted quantile from \ref{['eq:q-level']} as $n$ and $\epsilon$ increase, with automatically chosen values for $m$ and $\gamma$ described in Appendix \ref{['app:exp_details']}. As the number of samples grows and the privacy constraint relaxes, the procedure chooses a less conservative quantile, eventually approaching the limiting value $1-\alpha$. The mild fluctuations in the curves are due to differing choices of $m^*$ and $\gamma$.
  • Figure 3: Coverage and set size with private/non-private models and private/non-private conformal prediction. We demonstrate histograms of coverage and set size of non-private/private models and non-private/private conformal prediction at the level $\alpha=0.1$, with $\epsilon=8$, $\delta=1e-5$, and $n=5000$.
  • Figure 4: Coverage and set size for different values of $m$. We demonstrate the performance on Imagenet of private conformal prediction using a non-private ResNet-152 as the base model at $\alpha=0.1$ and $\epsilon=5$. The coverage is nearly constant over three orders of magnitude of bin numbers. All of the histograms on the right hand side are overlapping. See Section \ref{['sec:experiment2']} for details.
  • Figure 5: Coverage and set size for different values of $\epsilon$. We demonstrate the performance on ImageNet of private conformal prediction using a non-private ResNet-152 as the base model with $\alpha=0.1$. The coverage improves slightly for liberal (large) $\epsilon$, although the cost of privacy is evidently very low. All of the histograms on the right hand side are overlapping. See Section \ref{['sec:experiment3']} for details.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Theorem 1: Informal preview
  • Definition 1: Differential privacy dwork2006calibrating
  • Proposition 1: Privacy guarantee
  • Theorem 2: Coverage guarantee
  • Remark 1
  • proof : Proof sketch
  • Theorem 3: Coverage upper bound
  • Corollary 1: Coverage upper bound, simplified form
  • Lemma 1
  • proof
  • ...and 2 more