Table of Contents
Fetching ...

Privacy-Preserving Conformal Prediction Under Local Differential Privacy

Coby Penso, Bar Mahpud, Jacob Goldberger, Or Sheffet

TL;DR

The work tackles privacy-aware uncertainty quantification by combining conformal prediction with local differential privacy. It introduces two complementary LDP-CP methods: LDP-CP-L, which perturbs labels, and LDP-CP-S, which perturbs conformity scores, and proves finite-sample coverage guarantees under noisy calibration data. The approach uses noise-aware calibration and quantile estimation to preserve validity despite the privacy-induced perturbations, with theoretical guarantees and practical guidance on when to use each method. Experiments on medical-imaging datasets show that LDP-CP-L nearly matches non-private CP at moderate privacy and data sizes, while LDP-CP-S remains robust when the class set is large, especially when augmented by the shuffle model which improves privacy-utility trade-offs.

Abstract

Conformal prediction (CP) provides sets of candidate classes with a guaranteed probability of containing the true class. However, it typically relies on a calibration set with clean labels. We address privacy-sensitive scenarios where the aggregator is untrusted and can only access a perturbed version of the true labels. We propose two complementary approaches under local differential privacy (LDP). In the first approach, users do not access the model but instead provide their input features and a perturbed label using a k-ary randomized response. In the second approach, which enforces stricter privacy constraints, users add noise to their conformity score by binary search response. This method requires access to the classification model but preserves both data and label privacy. Both approaches compute the conformal threshold directly from noisy data without accessing the true labels. We prove finite-sample coverage guarantees and demonstrate robust coverage even under severe randomization. This approach unifies strong local privacy with predictive uncertainty control, making it well-suited for sensitive applications such as medical imaging or large language model queries, regardless of whether users can (or are willing to) compute their own scores.

Privacy-Preserving Conformal Prediction Under Local Differential Privacy

TL;DR

The work tackles privacy-aware uncertainty quantification by combining conformal prediction with local differential privacy. It introduces two complementary LDP-CP methods: LDP-CP-L, which perturbs labels, and LDP-CP-S, which perturbs conformity scores, and proves finite-sample coverage guarantees under noisy calibration data. The approach uses noise-aware calibration and quantile estimation to preserve validity despite the privacy-induced perturbations, with theoretical guarantees and practical guidance on when to use each method. Experiments on medical-imaging datasets show that LDP-CP-L nearly matches non-private CP at moderate privacy and data sizes, while LDP-CP-S remains robust when the class set is large, especially when augmented by the shuffle model which improves privacy-utility trade-offs.

Abstract

Conformal prediction (CP) provides sets of candidate classes with a guaranteed probability of containing the true class. However, it typically relies on a calibration set with clean labels. We address privacy-sensitive scenarios where the aggregator is untrusted and can only access a perturbed version of the true labels. We propose two complementary approaches under local differential privacy (LDP). In the first approach, users do not access the model but instead provide their input features and a perturbed label using a k-ary randomized response. In the second approach, which enforces stricter privacy constraints, users add noise to their conformity score by binary search response. This method requires access to the classification model but preserves both data and label privacy. Both approaches compute the conformal threshold directly from noisy data without accessing the true labels. We prove finite-sample coverage guarantees and demonstrate robust coverage even under severe randomization. This approach unifies strong local privacy with predictive uncertainty control, making it well-suited for sensitive applications such as medical imaging or large language model queries, regardless of whether users can (or are willing to) compute their own scores.

Paper Structure

This paper contains 15 sections, 3 theorems, 9 equations, 5 figures, 1 table, 3 algorithms.

Key Result

theorem 1

Assume you have a noisy calibration set of size $n$ with noise level $\beta$, set $\Delta(n,\beta,\delta)=\sqrt{\frac{\log(4/\delta)}{2nh^2}}$ s.t. $h=\frac{1-\beta}{1+\beta}$ and that you pick $q$ such that $\hat{F}^c(q) \ge 1-\alpha$. Then with probability at least $1-\delta$ (on the random calibr

Figures (5)

  • Figure 1: Local Differential Private Conformal Prediction on Labels (LDP-CP-L).
  • Figure 2: Local Differential Private Conformal Prediction on Scores (LDP-CP-S).
  • Figure 3: CP correction terms $\Delta_L,\Delta_S$ as a function of $\epsilon$ privacy parameter across different dataset configurations of $n$ and $k$without the shuffle model.
  • Figure 5: Comparison of $\Delta_L$ and $\Delta_S$ as a function of the number of classes $k$ and dataset size $n$, for $\epsilon=2,4,8$.
  • Figure 6: Size of prediction set (left) and coverage (right) as a function of the privacy $\epsilon$ (bottom x-axis) and effective privacy $\epsilon^{\text{eff}}$ (top x-axis). We show the (mean $\pm$ std) on TissueMNIST and APS score.

Theorems & Definitions (6)

  • theorem 1
  • definition 1
  • theorem 2: LDP-CP-L
  • proof
  • theorem 3: LDP-CP-S
  • proof