Table of Contents
Fetching ...

Max-Rank: Efficient Multiple Testing for Conformal Prediction

Alexander Timans, Christoph-Nikolas Straehle, Kaspar Sakmann, Christian A. Naesseth, Eric Nalisnick

TL;DR

This work tackles the challenge of multiple testing in conformal prediction (CP) by introducing max-rank, a rank-based, resampling-inspired correction that aggregates information across tests via an $\ell^{\infty}$-norm in rank space to control the family-wise error rate at level $\alpha$. The approach links to Westfall–Young corrections and copula-based formulations, providing a theoretical guarantee of FWER control and potentially tighter thresholds than Bonferroni under positive dependencies. Empirically, max-rank delivers valid CP coverage with narrower prediction intervals and faster runtimes than copula-based alternatives across multi-target regression and conformal object detection tasks. Overall, max-rank extends the CP toolkit for reliable uncertainty quantification in settings with many parallel tests by leveraging rank-order dependencies without imposing extra CP assumptions.

Abstract

Multiple hypothesis testing (MHT) frequently arises in scientific inquiries, and concurrent testing of multiple hypotheses inflates the risk of Type-I errors or false positives, rendering MHT corrections essential. This paper addresses MHT in the context of conformal prediction, a flexible framework for predictive uncertainty quantification. Some conformal applications give rise to simultaneous testing, and positive dependencies among tests typically exist. We introduce $\texttt{max-rank}$, a novel correction that exploits these dependencies whilst efficiently controlling the family-wise error rate. Inspired by existing permutation-based corrections, $\texttt{max-rank}$ leverages rank order information to improve performance and integrates readily with any conformal procedure. We establish its theoretical and empirical advantages over the common Bonferroni correction and its compatibility with conformal prediction, highlighting the potential to strengthen predictive uncertainty estimates.

Max-Rank: Efficient Multiple Testing for Conformal Prediction

TL;DR

This work tackles the challenge of multiple testing in conformal prediction (CP) by introducing max-rank, a rank-based, resampling-inspired correction that aggregates information across tests via an -norm in rank space to control the family-wise error rate at level . The approach links to Westfall–Young corrections and copula-based formulations, providing a theoretical guarantee of FWER control and potentially tighter thresholds than Bonferroni under positive dependencies. Empirically, max-rank delivers valid CP coverage with narrower prediction intervals and faster runtimes than copula-based alternatives across multi-target regression and conformal object detection tasks. Overall, max-rank extends the CP toolkit for reliable uncertainty quantification in settings with many parallel tests by leveraging rank-order dependencies without imposing extra CP assumptions.

Abstract

Multiple hypothesis testing (MHT) frequently arises in scientific inquiries, and concurrent testing of multiple hypotheses inflates the risk of Type-I errors or false positives, rendering MHT corrections essential. This paper addresses MHT in the context of conformal prediction, a flexible framework for predictive uncertainty quantification. Some conformal applications give rise to simultaneous testing, and positive dependencies among tests typically exist. We introduce , a novel correction that exploits these dependencies whilst efficiently controlling the family-wise error rate. Inspired by existing permutation-based corrections, leverages rank order information to improve performance and integrates readily with any conformal procedure. We establish its theoretical and empirical advantages over the common Bonferroni correction and its compatibility with conformal prediction, highlighting the potential to strengthen predictive uncertainty estimates.
Paper Structure (26 sections, 6 theorems, 27 equations, 5 figures, 3 tables, 4 algorithms)

This paper contains 26 sections, 6 theorems, 27 equations, 5 figures, 3 tables, 4 algorithms.

Key Result

Proposition 1

The max-rank procedure (algo:max-rank) provides a solution $q_{\max}$ to the constrained problem in eq:copula-opt-problem-0 with FWER control at level $\alpha$.

Figures (5)

  • Figure 1: We examine global adjusted significance levels $\hat{\alpha}$ against the desired FWER control level $\alpha=0.05$ for varying correlations $\rho$ and number of tests $m$ (for fixed $n=100\,000$, shading denotes std. deviation across $100$ random trials). While Bonferroni becomes increasingly conservative as either correlation or test counts increase, max-rank provides stable and tight FWER control at target level.
  • Figure 2: We examine individual adjusted significance levels $\hat{\alpha}_k$ for $k =1,\dots, 5$ against the ideal Type-I error control level $\alpha_k=0.05$ for varying correlations $\rho$ (for fixed $n=100\,000$, shading denotes std. deviation across $100$ random trials). While Bonferroni enforces highly conservative Type-I error levels to ensure FWER control, max-rank can gradually exploit positive test dependencies to improve its performance in each dimension.
  • Figure 3: Empirical coverage (left) and mean prediction interval width (right) for different corrections ($\alpha=0.1$) on BDD100k. Results are averaged across objects from multiple classes (see \ref{['subsec:app-exp-objdetect']}), while boxplots depict the distribution over $100$ random trials. Inference times in sec. per trial (left to right by method): $2.7$, $2.7$, $17.8$, $62.6$, $2.7$, and $2.95$. max-rank outperforms other corrections with notably lower runtimes than copula-based methods.
  • Figure 4: We examine global adjusted significance levels $\hat{\alpha}$ against the desired FWER control level $\alpha=0.05$ for varying correlations $\rho$, number of tests $m$ and sample size $n$ (shading denotes std. deviation across 10 random trials). Empirical FWER control rates are directly related to the magnitudes of $m$ and $n$, suggesting that larger number of parallel tests require larger sample sizes. We also see that Bonferroni is consistently more conservative than max-rank across all combinations.
  • Figure 5: A visual example of the constructed conformal prediction intervals with max-rank on a test image for the class 'car' (and not 'truck'). True bounding boxes are in blue, two-sided prediction interval regions are shaded in orange.

Theorems & Definitions (9)

  • Remark 1
  • Remark 2
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Definition 1: Positive lower orthant dependency (PLOD), nelsen2006introduction, Def. 5.7.1
  • Theorem 1: Distribution of ranks ak.kuchibhotla2021, Thm. 2
  • Theorem 2: Exchangeability under transformations ak.kuchibhotla2021d.commenges2003
  • Corollary 1: ak.kuchibhotla2021, Cor. 1