Table of Contents
Fetching ...

Data-light Uncertainty Set Merging with Admissibility

Shenghao Qin, Jianliang He, Qi Kuang, Bowen Gang, Yin Xia

TL;DR

This work addresses merging multiple uncertainty sets when only the initial sets and their miscoverage controls are available, possibly with dependence. It introduces SAT, a data-light framework that converts set-merging into aggregating synthetic statistics (e-values or p-values) and then inverting a hypothesis test to form a merged set with finite-sample coverage guarantees. The key theoretical contributions include admissibility results for deterministic SAT (with general dependence) and valid aggregation schemes, plus practical implementations for infinite candidate spaces. Through extensive simulations and a real-data application to ImageNet_val, SAT demonstrates reliable coverage while delivering smaller merged sets compared to individual inputs, highlighting its potential for distributed, privacy-preserving, or algorithm-ensemble uncertainty merging. The method provides a principled, flexible toolkit for integrating diverse conformal- or confidence-based uncertainty sets under limited data access and varying dependencies.

Abstract

This article introduces a Synthetics, Aggregation, and Test inversion (SAT) approach for merging diverse and potentially dependent uncertainty sets into a single unified set. The procedure is data-light, relying only on initial sets and their nominal levels, and it flexibly adapts to user-specified input sets with possibly varying coverage guarantees. SAT is motivated by the challenge of integrating uncertainty sets when only the initial sets and their control levels are available-for example, when merging confidence sets from distributed sites under communication constraints or combining conformal prediction sets generated by different algorithms or data splits. To address this, SAT constructs and aggregates novel synthetic test statistics, and then derive merged sets through test inversion. Our method leverages the duality between set estimation and hypothesis testing, ensuring reliable coverage in dependent scenarios. A key theoretical contribution is a rigorous analysis of SAT's properties, including its admissibility in the context of deterministic set merging. Both theoretical analyses and empirical results confirm the method's finite-sample coverage validity and desirable set sizes.

Data-light Uncertainty Set Merging with Admissibility

TL;DR

This work addresses merging multiple uncertainty sets when only the initial sets and their miscoverage controls are available, possibly with dependence. It introduces SAT, a data-light framework that converts set-merging into aggregating synthetic statistics (e-values or p-values) and then inverting a hypothesis test to form a merged set with finite-sample coverage guarantees. The key theoretical contributions include admissibility results for deterministic SAT (with general dependence) and valid aggregation schemes, plus practical implementations for infinite candidate spaces. Through extensive simulations and a real-data application to ImageNet_val, SAT demonstrates reliable coverage while delivering smaller merged sets compared to individual inputs, highlighting its potential for distributed, privacy-preserving, or algorithm-ensemble uncertainty merging. The method provides a principled, flexible toolkit for integrating diverse conformal- or confidence-based uncertainty sets under limited data access and varying dependencies.

Abstract

This article introduces a Synthetics, Aggregation, and Test inversion (SAT) approach for merging diverse and potentially dependent uncertainty sets into a single unified set. The procedure is data-light, relying only on initial sets and their nominal levels, and it flexibly adapts to user-specified input sets with possibly varying coverage guarantees. SAT is motivated by the challenge of integrating uncertainty sets when only the initial sets and their control levels are available-for example, when merging confidence sets from distributed sites under communication constraints or combining conformal prediction sets generated by different algorithms or data splits. To address this, SAT constructs and aggregates novel synthetic test statistics, and then derive merged sets through test inversion. Our method leverages the duality between set estimation and hypothesis testing, ensuring reliable coverage in dependent scenarios. A key theoretical contribution is a rigorous analysis of SAT's properties, including its admissibility in the context of deterministic set merging. Both theoretical analyses and empirical results confirm the method's finite-sample coverage validity and desirable set sizes.

Paper Structure

This paper contains 42 sections, 23 theorems, 121 equations, 6 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

Suppose eq:miscoverage holds. Then, we have $\mathbb{E}\{e_\ell(Y)\} \leq 1$ for all $\ell\in[L]$, where $e_\ell(\cdot)$ is the e-function defined in eq:sye_pred.

Figures (6)

  • Figure 1: A schematic illustration of the SAT procedure using synthetic e-values. For each candidate $y_i$, synthetic e-values are generated from the initial sets and are then aggregated. The final merged set, $\bar{\mathcal{C}}_{\alpha}$, is constructed by including only those candidates whose aggregated statistic passes a pre-defined significance threshold. In this illustration, the aggregated statistics for $y_1$ and $y_m$ pass the threshold (shaded green), while the one for $y_2$ does not (shaded red).
  • Figure 2: Coverage and size of the merged uncertainty sets for the normal mean estimation problem. Each individual set is constructed based on a two-sided $z$-test.
  • Figure 3: Coverage and size of the merged conformal prediction sets evaluated using different score functions and merging methods. The initial sets are constructed using a full conformal approach, with neural network, random forest, LASSO, and linear model selected as the score functions.
  • Figure 4: Coverage and size of the merged conformal prediction sets by different splits of the training and calibration data. The initial sets are constructed using a split conformal approach with LASSO selected as the score function.
  • Figure 5: Coverage and size of the merged prediction sets using different learning algorithms for dataset ImageNet_val.
  • ...and 1 more figures

Theorems & Definitions (51)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Theorem 3
  • Proposition 5
  • ...and 41 more