Table of Contents
Fetching ...

Weighted Aggregation of Conformity Scores for Classification

Rui Luo, Zhixin Zhou

TL;DR

This paper extends conformal prediction for multiclass classification by aggregating multiple non-conformity score functions through optimal weight learning. By formulating weighted scores and exploring four data-splitting regimes (VFCP, EFCP, DLCP, DLCP+), it provides finite-sample validity guarantees and near-oracle efficiency, grounded in VC theory with a confirmed VC-dimension upper bound of $d+1$ for the relevant subgraph classes. Theoretical results show that, under reasonable assumptions, coverage remains at $1-oldsymbol{ u}$ while the expected prediction-set size approaches the oracle benchmark as data grow, with explicit bounds for each split strategy. Empirically, the approach yields consistently smaller, valid prediction sets compared to single-score baselines across CIFAR-10/100, and it demonstrates substantial gains when combining models, supporting the practical utility of score-function and model weighting in conformal prediction.

Abstract

Conformal prediction is a powerful framework for constructing prediction sets with valid coverage guarantees in multi-class classification. However, existing methods often rely on a single score function, which can limit their efficiency and informativeness. We propose a novel approach that combines multiple score functions to improve the performance of conformal predictors by identifying optimal weights that minimize prediction set size. Our theoretical analysis establishes a connection between the weighted score functions and subgraph classes of functions studied in Vapnik-Chervonenkis theory, providing a rigorous mathematical basis for understanding the effectiveness of the proposed method. Experiments demonstrate that our approach consistently outperforms single-score conformal predictors while maintaining valid coverage, offering a principled and data-driven way to enhance the efficiency and practicality of conformal prediction in classification tasks.

Weighted Aggregation of Conformity Scores for Classification

TL;DR

This paper extends conformal prediction for multiclass classification by aggregating multiple non-conformity score functions through optimal weight learning. By formulating weighted scores and exploring four data-splitting regimes (VFCP, EFCP, DLCP, DLCP+), it provides finite-sample validity guarantees and near-oracle efficiency, grounded in VC theory with a confirmed VC-dimension upper bound of for the relevant subgraph classes. Theoretical results show that, under reasonable assumptions, coverage remains at while the expected prediction-set size approaches the oracle benchmark as data grow, with explicit bounds for each split strategy. Empirically, the approach yields consistently smaller, valid prediction sets compared to single-score baselines across CIFAR-10/100, and it demonstrates substantial gains when combining models, supporting the practical utility of score-function and model weighting in conformal prediction.

Abstract

Conformal prediction is a powerful framework for constructing prediction sets with valid coverage guarantees in multi-class classification. However, existing methods often rely on a single score function, which can limit their efficiency and informativeness. We propose a novel approach that combines multiple score functions to improve the performance of conformal predictors by identifying optimal weights that minimize prediction set size. Our theoretical analysis establishes a connection between the weighted score functions and subgraph classes of functions studied in Vapnik-Chervonenkis theory, providing a rigorous mathematical basis for understanding the effectiveness of the proposed method. Experiments demonstrate that our approach consistently outperforms single-score conformal predictors while maintaining valid coverage, offering a principled and data-driven way to enhance the efficiency and practicality of conformal prediction in classification tasks.
Paper Structure (37 sections, 10 theorems, 48 equations, 6 figures, 1 table, 4 algorithms)

This paper contains 37 sections, 10 theorems, 48 equations, 6 figures, 1 table, 4 algorithms.

Key Result

Lemma 1

Suppose the samples in $\mathcal{I}$ are i.i.d., then

Figures (6)

  • Figure 1: This example illustrates a framework for data splitting into $\mathcal{I}_1, \mathcal{I}_2, \mathcal{I}_3$, and $\mathcal{I}_\text{test}$. Algorithm \ref{['alg:weight']} presents the complete procedure. Briefly, $\mathcal{I}_1$ and $\mathcal{I}_2$ are used in Steps 1-2 to select the optimal weight $\widehat{\mathbf{w}}$, while $\mathcal{I}_3$ is used in Step 3 as the calibration set for $\mathcal{I}_\text{test}$ predictions. We present four options: VFCP, EFCP, DLCP, and DLCP+. Their coverage and size properties are discussed theoretically in Section \ref{['sec:theory']} and empirically in Section \ref{['sec:experiment']}.
  • Figure 2: Boxplot comparison of different score functions at a significance level of $\alpha=0.01$ on CIFAR-100. Our weighted combination method achieves the guaranteed coverage of 99% while maintaining the smallest prediction set size.
  • Figure 3: Comparison of size vs. coverage for various score functions and our proposed method across $\alpha$ values (0.01-0.05). Our weighted combination method (red) consistently outperforms the other baseline methods by achieving the desired coverage rate with smaller prediction set sizes.
  • Figure 4: Across various score functions, our weighted combination of models outperformed any individual model and achieved optimal size on the CIFAR-10 dataset across $\alpha$ values (0.01–0.05).
  • Figure 5: Across various score functions, our weighted combination of models outperformed any individual model and achieved optimal size on the CIFAR-100 dataset across $\alpha$ values (0.01–0.05).
  • ...and 1 more figures

Theorems & Definitions (16)

  • Lemma 1
  • Remark
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Proposition 1
  • proof
  • Lemma 2
  • proof
  • ...and 6 more