Private Prediction Sets
Anastasios N. Angelopoulos, Stephen Bates, Tijana Zrnic, Michael I. Jordan
TL;DR
The paper addresses the need for reliable uncertainty quantification alongside privacy in consequential decision systems by integrating differential privacy into conformal prediction to produce private prediction sets with finite-sample coverage guarantees. It introduces a DP conformal prediction algorithm that calibrates prediction-set thresholds using a privatized quantile computed via discretized bins and the exponential mechanism, ensuring $\epsilon$-DP and a coverage bound of $1-\alpha \le \mathbb{P}\{Y\in \mathcal{C}(X)\} \le 1-\alpha + O((n\epsilon)^{-1})$. The authors provide a formal algorithm, theoretical guarantees, and an analysis of the privacy-utility tradeoffs controlled by the number of bins and the privacy parameter, complemented by extensive experiments on CIFAR-10/ImageNet and a COVID-19 diagnosis task. They find that private calibration contributes modestly to performance, while the dominant privacy cost comes from private model training, suggesting that advances in private training will directly improve final private prediction sets. Overall, the work enables robust, private uncertainty quantification for arbitrary models and datasets, with practical implications for privacy-preserving decision support systems.
Abstract
In real-world settings involving consequential decision-making, the deployment of machine learning systems generally requires both reliable uncertainty quantification and protection of individuals' privacy. We present a framework that treats these two desiderata jointly. Our framework is based on conformal prediction, a methodology that augments predictive models to return prediction sets that provide uncertainty quantification -- they provably cover the true response with a user-specified probability, such as 90%. One might hope that when used with privately-trained models, conformal prediction would yield privacy guarantees for the resulting prediction sets; unfortunately, this is not the case. To remedy this key problem, we develop a method that takes any pre-trained predictive model and outputs differentially private prediction sets. Our method follows the general approach of split conformal prediction; we use holdout data to calibrate the size of the prediction sets but preserve privacy by using a privatized quantile subroutine. This subroutine compensates for the noise introduced to preserve privacy in order to guarantee correct coverage. We evaluate the method on large-scale computer vision datasets.
