Table of Contents
Fetching ...

Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference

Luca Masserano, Alex Shen, Michele Doro, Tommaso Dorigo, Rafael Izbicki, Ann B. Lee

TL;DR

The paper addresses reliable classification under generalized label shift (GLS) in likelihood-free inference by introducing a nuisance-parameter aware framework. It recasts classification as hypothesis testing and derives rejection probabilities across the entire nuisance space, enabling ROC-based calibration that is invariant to GLS. The authors construct nuisance-aware prediction sets (NAPS) with guaranteed (1−α) coverage conditional on both the class and nuisance parameters, and further improve power by restricting nuisance parameters to data-dependent confidence sets (gamma). They validate the approach on synthetic data and two scientific simulators (single-cell RNA sequencing and atmospheric cosmic-ray showers), showing that NAPS provide robust uncertainty quantification under GLS while maintaining high predictive power, and demonstrate practical gains by conditioning on nuisance information when available. This method offers a principled, domain-adaptive way to obtain reliable predictions in heavy-modeling regimes typical of scientific inference.}

Abstract

An open scientific challenge is how to classify events with reliable measures of uncertainty, when we have a mechanistic model of the data-generating process but the distribution over both labels and latent nuisance parameters is different between train and target data. We refer to this type of distributional shift as generalized label shift (GLS). Direct classification using observed data $\mathbf{X}$ as covariates leads to biased predictions and invalid uncertainty estimates of labels $Y$. We overcome these biases by proposing a new method for robust uncertainty quantification that casts classification as a hypothesis testing problem under nuisance parameters. The key idea is to estimate the classifier's receiver operating characteristic (ROC) across the entire nuisance parameter space, which allows us to devise cutoffs that are invariant under GLS. Our method effectively endows a pre-trained classifier with domain adaptation capabilities and returns valid prediction sets while maintaining high power. We demonstrate its performance on two challenging scientific problems in biology and astroparticle physics with data from realistic mechanistic models.

Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference

TL;DR

The paper addresses reliable classification under generalized label shift (GLS) in likelihood-free inference by introducing a nuisance-parameter aware framework. It recasts classification as hypothesis testing and derives rejection probabilities across the entire nuisance space, enabling ROC-based calibration that is invariant to GLS. The authors construct nuisance-aware prediction sets (NAPS) with guaranteed (1−α) coverage conditional on both the class and nuisance parameters, and further improve power by restricting nuisance parameters to data-dependent confidence sets (gamma). They validate the approach on synthetic data and two scientific simulators (single-cell RNA sequencing and atmospheric cosmic-ray showers), showing that NAPS provide robust uncertainty quantification under GLS while maintaining high predictive power, and demonstrate practical gains by conditioning on nuisance information when available. This method offers a principled, domain-adaptive way to obtain reliable predictions in heavy-modeling regimes typical of scientific inference.}

Abstract

An open scientific challenge is how to classify events with reliable measures of uncertainty, when we have a mechanistic model of the data-generating process but the distribution over both labels and latent nuisance parameters is different between train and target data. We refer to this type of distributional shift as generalized label shift (GLS). Direct classification using observed data as covariates leads to biased predictions and invalid uncertainty estimates of labels . We overcome these biases by proposing a new method for robust uncertainty quantification that casts classification as a hypothesis testing problem under nuisance parameters. The key idea is to estimate the classifier's receiver operating characteristic (ROC) across the entire nuisance parameter space, which allows us to devise cutoffs that are invariant under GLS. Our method effectively endows a pre-trained classifier with domain adaptation capabilities and returns valid prediction sets while maintaining high power. We demonstrate its performance on two challenging scientific problems in biology and astroparticle physics with data from realistic mechanistic models.
Paper Structure (45 sections, 4 theorems, 41 equations, 19 figures, 1 table, 2 algorithms)

This paper contains 45 sections, 4 theorems, 41 equations, 19 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

Under GLS, the rejection probability (Definition def:reject_prob) of any test statistic $\lambda$ is invariant to GLS, that is

Figures (19)

  • Figure 1: Synthetic Example.Left (no GLS): Standard prediction sets $R_\alpha({\mathbf{x}})$ (red) guarantee marginal coverage at the nominal level. Nuisance-aware prediction sets (NAPS $\gamma = 0$; blue) are also marginally valid, but the "universality" of conditional validity across the entire nuisance parameter space comes at the price of more conservative prediction sets and lower power. Right (with GLS): Standard prediction sets are no longer valid and undercover for all $\alpha$ levels (red curve is below the black bisector), while NAPS are still valid. Furthermore, we can increase power while maintaining validity (NAPS $\gamma > 0$; green) by constructing $(1-\gamma)$ confidence sets of the nuisance parameter $\nu$ and deriving less conservative cutoffs given an observation. Here $\gamma = \alpha \times 0.01$.
  • Figure 2: Coverage under different batch protocols $\nu$ for the RNA-Seq example. Each marker represents the proportion of samples in the test set whose true label was included in the constructed prediction sets. Nuisance-aware prediction sets (NAPS $\gamma=0$; blue) are valid regardless of the protocol, which is unknown at inference time. All other methods for prediction sets with marginal coverage (red), class-conditional coverage (pink), and conformal adaptive prediction sets (gold) undercover for at least two batch protocols.
  • Figure 3: Dependence of the ROC on the energy of the cosmic-ray shower.Left: Receiver operating characteristic evaluated according to our method at different energy values (shades of blue). By estimating the entire ROC, we can control FPR or TPR at specified confidence levels for all $\boldsymbol{\nu} \in \mathcal{N}$, which is not possible with the “marginal” ROC curve (red). Right: Diagnostic P-P plot evaluated at four bins over energy for nuisance-aware ROC (shades of blue) and ROC that ignores nuisances (shades of red). To check if $\mathbb{P}_{\text{target}} \left( \lambda({\mathbf{X}}) \leq C |y,\boldsymbol{\nu}\right)$ is well estimated, we plot PIT values against a $\mathcal{U}(0, 1)$ distribution (dashed bisector; see Appendix \ref{['sec:roc_diag']} for details). This is clearly not the case if one ignores nuisance parameters.
  • Figure 4: Constraining the cosmic ray shower parameters.Top left: Illustration of the Southern Wide-field Gamma-ray Observatory (SWGO; abreu2019southern; image credit: Richard White) array of detectors with an incoming gamma ray (red). Bottom Left: Test statistic under $y_0 = 0$ (hadron) as a function of energy. At high energies, the class-conditional test statistics are well separated, implying that it is easier to distinguish gamma showers (red) from hadron showers (gold). Right: Confidence set for $\boldsymbol{\nu}$ at different $(1-\gamma)$ confidence levels obtained via the framework of masserano2023simulator. The true value of $\boldsymbol{\nu}$ is the black star.
  • Figure 5: Classification metrics within true and within predicted Gamma rays ($y=1$). Results are binned according to whether the shower energy is below (left) or above (right) the median value. Top panel: Nuisance-aware prediction sets (NAPS $\gamma=0$; blue) achieve high precision and low false discovery rates (FDR), especially at high confidence levels. In addition, by constraining the nuisance parameters $\boldsymbol{\nu} = (E, A, Z)$, we can increase performance (NAPS $\gamma>0$; green) with uniformly better results relative to the standard Bayes classifier (black dashed line). Bottom panel: Our set-valued classifier makes explicit its level of uncertainty on the label $y$ by returning ambiguous prediction sets (bottom row) for hard-to-classify ${\mathbf{x}}_{{\text{target}}}$. Even so, NAPS with $\gamma>0$ is able to achieve a higher number of true positives and lower number of false negatives relative to the Bayes classifier. Here $\gamma = \alpha \times 0.3$.
  • ...and 14 more figures

Theorems & Definitions (13)

  • Definition 1: Rejection probability
  • Definition 2: Nuisance-aware prediction set
  • Lemma 1: Invariance of the Rejection Probability to GLS
  • Definition 3: Confidence set for nuisance parameters
  • Theorem 1: Nuisance-aware cutoffs for FPR/TPR control
  • Theorem 2
  • proof : Proof of Lemma \ref{['lemma:reject_prob_invariance']}
  • proof : Proof of Theorem \ref{['lemma:NA_cutoff']}
  • proof : Proof of Theorem \ref{['thm:nacs_coverage']}
  • Lemma 2: Bayes classifier
  • ...and 3 more