Table of Contents
Fetching ...

Sparse Activations as Conformal Predictors

Margarida M. Campos, João Calém, Sophia Sklaviadis, Mário A. T. Figueiredo, André F. T. Martins

TL;DR

Conformal prediction provides distribution-free uncertainty sets for predictions. This paper forges a formal link between conformal prediction and sparse activation functions, specifically the $\gamma$-entmax family, by introducing non-conformity scores whose calibration corresponds to temperature scaling. At test time, the prediction sets align with the nonzero-support of the $\gamma$-entmax outputs, ensuring coverage guarantees. Empirical results on vision and text benchmarks show competitive coverage, efficiency, and adaptiveness compared to standard softmax-based conformal predictors. This approach enables sparse, interpretable set predictions with theoretical guarantees and flexible calibration.

Abstract

Conformal prediction is a distribution-free framework for uncertainty quantification that replaces point predictions with sets, offering marginal coverage guarantees (i.e., ensuring that the prediction sets contain the true label with a specified probability, in expectation). In this paper, we uncover a novel connection between conformal prediction and sparse softmax-like transformations, such as sparsemax and $γ$-entmax (with $γ> 1$), which may assign nonzero probability only to a subset of labels. We introduce new non-conformity scores for classification that make the calibration process correspond to the widely used temperature scaling method. At test time, applying these sparse transformations with the calibrated temperature leads to a support set (i.e., the set of labels with nonzero probability) that automatically inherits the coverage guarantees of conformal prediction. Through experiments on computer vision and text classification benchmarks, we demonstrate that the proposed method achieves competitive results in terms of coverage, efficiency, and adaptiveness compared to standard non-conformity scores based on softmax.

Sparse Activations as Conformal Predictors

TL;DR

Conformal prediction provides distribution-free uncertainty sets for predictions. This paper forges a formal link between conformal prediction and sparse activation functions, specifically the -entmax family, by introducing non-conformity scores whose calibration corresponds to temperature scaling. At test time, the prediction sets align with the nonzero-support of the -entmax outputs, ensuring coverage guarantees. Empirical results on vision and text benchmarks show competitive coverage, efficiency, and adaptiveness compared to standard softmax-based conformal predictors. This approach enables sparse, interpretable set predictions with theoretical guarantees and flexible calibration.

Abstract

Conformal prediction is a distribution-free framework for uncertainty quantification that replaces point predictions with sets, offering marginal coverage guarantees (i.e., ensuring that the prediction sets contain the true label with a specified probability, in expectation). In this paper, we uncover a novel connection between conformal prediction and sparse softmax-like transformations, such as sparsemax and -entmax (with ), which may assign nonzero probability only to a subset of labels. We introduce new non-conformity scores for classification that make the calibration process correspond to the widely used temperature scaling method. At test time, applying these sparse transformations with the calibrated temperature leads to a support set (i.e., the set of labels with nonzero probability) that automatically inherits the coverage guarantees of conformal prediction. Through experiments on computer vision and text classification benchmarks, we demonstrate that the proposed method achieves competitive results in terms of coverage, efficiency, and adaptiveness compared to standard non-conformity scores based on softmax.

Paper Structure

This paper contains 35 sections, 3 theorems, 14 equations, 9 figures, 6 tables, 1 algorithm.

Key Result

proposition 1

Let $C_\alpha: \mathcal{X}\rightarrow 2^\mathcal{Y}$ be a conformal predictor (as described in §subsec:background_cp). Define the following nonconformity score: where $k(y)$ is the index of label $y$ in the sorted array $\bm{z}$, and let $\hat{q}$ be the $\lceil(n+1)(1-\alpha)\rceil/n$ empirical quantile of the set of calibration scores. Then, setting the sparsemax temperature as $\beta^{-1} := \

Figures (9)

  • Figure 1: Conformal prediction meets temperature scaling: we derive new non-conformity scores $s(x, y)$ that make conformal prediction equivalent to $\gamma$-entmax temperature scaling.
  • Figure 2: Illustration of entmax in the two-dimensional case $\gamma$-entmax([$t$, 0])$_1$.
  • Figure 3: Output of different $\gamma\text{-}\mathsf{entmax}$ transformations on label scores $\bm{z} = [1, -1, -0.2, 0.4, -0.5]$, as a function of temperature parameter $\beta^{-1}$, where $[p_1,p_2,p_3,p_4,p_5] = \gamma\text{-}\mathsf{entmax}(\beta\bm{z})$.
  • Figure 4: Average prediction set size as a function of significance level $\alpha$.
  • Figure 5: Average set size as a function of $\alpha$, for the ImageNet dataset with varying $\gamma$ for $\mathop{\mathrm{\mathsf{entmax}}}\limits$.
  • ...and 4 more figures

Theorems & Definitions (6)

  • proposition 1
  • proof
  • proposition 2
  • proof
  • proposition 3
  • proof