Table of Contents
Fetching ...

Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning

Yan Scholten, Stephan Günnemann

TL;DR

This work tackles the fragility of conformal prediction under data poisoning by introducing Reliable Prediction Sets (RPS). RPS combines a smoothed, partitioned-training score that votes across $k_t$ classifiers with a calibration strategy that forms a majority prediction set from $k_c$ independent calibration partitions, yielding pointwise reliability certificates under worst-case training and calibration data modifications. The authors prove marginal coverage on clean data and provide explicit conditions for both coverage and size reliability under various poisoning scenarios, supported by extensive experiments on image classification benchmarks showing non-trivial robustness with manageable prediction-set sizes. The approach advances trustworthy uncertainty quantification in settings where data integrity cannot be guaranteed, with practical considerations on calibration data requirements, computational costs, and transferability to pretrained-model setups.

Abstract

Conformal prediction provides model-agnostic and distribution-free uncertainty quantification through prediction sets that are guaranteed to include the ground truth with any user-specified probability. Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data, which can significantly alter prediction sets in practice. As a solution, we propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning. To ensure reliability under training poisoning, we introduce smoothed score functions that reliably aggregate predictions of classifiers trained on distinct partitions of the training data. To ensure reliability under calibration poisoning, we construct multiple prediction sets, each calibrated on distinct subsets of the calibration data. We then aggregate them into a majority prediction set, which includes a class only if it appears in a majority of the individual sets. Both proposed aggregations mitigate the influence of datapoints in the training and calibration data on the final prediction set. We experimentally validate our approach on image classification tasks, achieving strong reliability while maintaining utility and preserving coverage on clean data. Overall, our approach represents an important step towards more trustworthy uncertainty quantification in the presence of data poisoning.

Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning

TL;DR

This work tackles the fragility of conformal prediction under data poisoning by introducing Reliable Prediction Sets (RPS). RPS combines a smoothed, partitioned-training score that votes across classifiers with a calibration strategy that forms a majority prediction set from independent calibration partitions, yielding pointwise reliability certificates under worst-case training and calibration data modifications. The authors prove marginal coverage on clean data and provide explicit conditions for both coverage and size reliability under various poisoning scenarios, supported by extensive experiments on image classification benchmarks showing non-trivial robustness with manageable prediction-set sizes. The approach advances trustworthy uncertainty quantification in settings where data integrity cannot be guaranteed, with practical considerations on calibration data requirements, computational costs, and transferability to pretrained-model setups.

Abstract

Conformal prediction provides model-agnostic and distribution-free uncertainty quantification through prediction sets that are guaranteed to include the ground truth with any user-specified probability. Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data, which can significantly alter prediction sets in practice. As a solution, we propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning. To ensure reliability under training poisoning, we introduce smoothed score functions that reliably aggregate predictions of classifiers trained on distinct partitions of the training data. To ensure reliability under calibration poisoning, we construct multiple prediction sets, each calibrated on distinct subsets of the calibration data. We then aggregate them into a majority prediction set, which includes a class only if it appears in a majority of the individual sets. Both proposed aggregations mitigate the influence of datapoints in the training and calibration data on the final prediction set. We experimentally validate our approach on image classification tasks, achieving strong reliability while maintaining utility and preserving coverage on clean data. Overall, our approach represents an important step towards more trustworthy uncertainty quantification in the presence of data poisoning.

Paper Structure

This paper contains 20 sections, 3 theorems, 17 equations, 17 figures, 5 algorithms.

Key Result

Theorem 1

Given user-specified coverage probability $1-\alpha \in (0,1)$, test sample $(x_{n+1}, y_{n+1})\in{\mathcal{D}}_{test}\xspace$ exchangeable with ${\mathcal{D}}_{calib}$, and a score function $s$, we can construct the following prediction set which fulfills the following marginal coverage guarantee for $\tau=\textrm{Quant}(\alpha_n; S)$. Specifically, the threshold $\tau$ is chosen as the $\alpha

Figures (17)

  • Figure 1: Conformal prediction (CP) is not reliable under poisoning (orange) of training and calibration data, undermining its practical utility in safety-critical applications. As a solution, we propose reliable prediction sets (RPS): A novel approach for constructing more reliable prediction sets. We (i) aggregate predictions of classifiers trained on distinct partitions, and (ii) merge multiple prediction sets ${\mathcal{C}}_i(x)$$=$$\{ y: s(x,y)$$\,\geq\,$$\tau_i\}$ calibrated on separate partitions into a majority prediction set that includes classes only if a majority of the prediction sets ${\mathcal{C}}_i$ agree. This way RPS reduces the influence of datapoints while preserving the coverage guarantee of conformal prediction on clean data.
  • Figure 2: Worst-case reliability guarantees across three scenarios: (a) poisoning of the calibration data, (b) poisoning of the training data, and (c) poisoning of both datasets. Our guarantees against coverage attacks are stronger when training data is poisoned, whereas for calibration attacks our method offers stronger guarantees for size reliability. Notably, even under strong adversarial conditions where both datasets can be poisoned we still provide non-trivial reliability guarantees.
  • Figure 3: Average set size and empirical coverage in all three different experiment settings (a--c). Notably, our reliable prediction sets yield valid coverage guarantees without becoming too large.
  • Figure 4: (1): Average prediction set size of majority prediction sets across three different datasets. (2,3): Softmax ablation study for empirically justifying smoothing of the voting function ($\alpha=0.05$).
  • Figure 5: SVHN: Worst-case reliability guarantees across three scenarios: (a) poisoning of the calibration data, (b) poisoning of the training data, and (c) poisoning of both datasets.
  • ...and 12 more figures

Theorems & Definitions (12)

  • Theorem 1
  • Definition 1: Reliability
  • Lemma 1
  • proof
  • proof
  • proof
  • Proposition 1
  • proof
  • proof
  • proof
  • ...and 2 more