Table of Contents
Fetching ...

Credal Two-Sample Tests of Epistemic Uncertainty

Siu Lun Chau, Antonin Schrab, Arthur Gretton, Dino Sejdinovic, Krikamol Muandet

TL;DR

This work introduces credal two-sample tests for comparing epistemic uncertainty modeled as credal sets, extending classical two-sample testing to hypotheses about inclusion, equality, and intersection of credal sets. The authors develop a kernel-based, nonparametric framework with permutation-based inference and adaptive sample splitting to control Type I error while handling nuisance parameters. Central contributions include four credal hypotheses, the kernel credal discrepancy (KCD) as a unifying objective, and rigorous theoretical guarantees for specification, inclusion, equality, and plausibility tests. Empirical results on synthetic and semi-synthetic data demonstrate robust Type I control and improved power relative to existing methods, highlighting practical relevance for robust uncertainty quantification and domain generalisation. The work also discusses interpretational and philosophical aspects of probabilities within credal testing, and outlines future directions such as independence testing among credal sets and broader applicability to nonparametric mixture testing.

Abstract

We introduce credal two-sample testing, a new hypothesis testing framework for comparing credal sets -- convex sets of probability measures where each element captures aleatoric uncertainty and the set itself represents epistemic uncertainty that arises from the modeller's partial ignorance. Compared to classical two-sample tests, which focus on comparing precise distributions, the proposed framework provides a broader and more versatile set of hypotheses. This approach enables the direct integration of epistemic uncertainty, effectively addressing the challenges arising from partial ignorance in hypothesis testing. By generalising two-sample test to compare credal sets, our framework enables reasoning for equality, inclusion, intersection, and mutual exclusivity, each offering unique insights into the modeller's epistemic beliefs. As the first work on nonparametric hypothesis testing for comparing credal sets, we focus on finitely generated credal sets derived from i.i.d. samples from multiple distributions -- referred to as credal samples. We formalise these tests as two-sample tests with nuisance parameters and introduce the first permutation-based solution for this class of problems, significantly improving existing methods. Our approach properly incorporates the modeller's epistemic uncertainty into hypothesis testing, leading to more robust and credible conclusions, with kernel-based implementations for real-world applications.

Credal Two-Sample Tests of Epistemic Uncertainty

TL;DR

This work introduces credal two-sample tests for comparing epistemic uncertainty modeled as credal sets, extending classical two-sample testing to hypotheses about inclusion, equality, and intersection of credal sets. The authors develop a kernel-based, nonparametric framework with permutation-based inference and adaptive sample splitting to control Type I error while handling nuisance parameters. Central contributions include four credal hypotheses, the kernel credal discrepancy (KCD) as a unifying objective, and rigorous theoretical guarantees for specification, inclusion, equality, and plausibility tests. Empirical results on synthetic and semi-synthetic data demonstrate robust Type I control and improved power relative to existing methods, highlighting practical relevance for robust uncertainty quantification and domain generalisation. The work also discusses interpretational and philosophical aspects of probabilities within credal testing, and outlines future directions such as independence testing among credal sets and broader applicability to nonparametric mixture testing.

Abstract

We introduce credal two-sample testing, a new hypothesis testing framework for comparing credal sets -- convex sets of probability measures where each element captures aleatoric uncertainty and the set itself represents epistemic uncertainty that arises from the modeller's partial ignorance. Compared to classical two-sample tests, which focus on comparing precise distributions, the proposed framework provides a broader and more versatile set of hypotheses. This approach enables the direct integration of epistemic uncertainty, effectively addressing the challenges arising from partial ignorance in hypothesis testing. By generalising two-sample test to compare credal sets, our framework enables reasoning for equality, inclusion, intersection, and mutual exclusivity, each offering unique insights into the modeller's epistemic beliefs. As the first work on nonparametric hypothesis testing for comparing credal sets, we focus on finitely generated credal sets derived from i.i.d. samples from multiple distributions -- referred to as credal samples. We formalise these tests as two-sample tests with nuisance parameters and introduce the first permutation-based solution for this class of problems, significantly improving existing methods. Our approach properly incorporates the modeller's epistemic uncertainty into hypothesis testing, leading to more robust and credible conclusions, with kernel-based implementations for real-world applications.

Paper Structure

This paper contains 57 sections, 17 theorems, 92 equations, 16 figures, 1 table, 8 algorithms.

Key Result

Proposition 0

$\operatorname{Inc}({\mathcal{C}}_X,{\mathcal{C}}_Y) = 0$ if and only if ${\mathcal{C}}_X\subseteq {\mathcal{C}}_Y$, $\operatorname{Eq}({\mathcal{C}}_X, {\mathcal{C}}_Y) = 0$ if and only if ${\mathcal{C}}_X = {\mathcal{C}}_Y$, and $\operatorname{Int}({\mathcal{C}}_X, {\mathcal{C}}_Y) = 0$ if and onl

Figures (16)

  • Figure 1: Motivation: From comparing precise distributions to comparing rational epistemic beliefs.
  • Figure 2: Different comparisons between credal sets within a probability simplex with $2$ degrees of freedom.
  • Figure 3: We present the experimental results of our credal tests (labelled as CMMD) on synthetic data at a $0.05$ significance level (black dotted line). CMMD$(0)$ uses fixed sample splitting and fails to control Type I error, rendering it invalid. It is included in the power plot for completeness but should not be compared to other valid tests.
  • Figure 4: We visualise the impact of estimation on the null statistic distribution when using fixed splitting ratio CMMD$(0)$ and adaptive splitting ratio CMMD$(0.33)$. Using a fixed sample splitting scheme results in an empirical null distribution that presents an observable shift compared to the null distribution based on the oracle parameter. On the other hand, the adaptive sample splitting scheme results in an empirical null distribution that resembles the shape of the oracle version.
  • Figure 5: (Left) and (Middle): Distribution of the estimated parameters $\bm \lambda^e$ and $\boldsymbol{\eta}^e$. Due to the existence of multiple pairs of weights under which the null hypothesis holds, our randomised optimisation procedure may identify a different pair of weights in each round during the repeated data sampling used to approximate the Type I error distribution in the experiments. (Right) The null statistic distribution for CMMD$(0)$ in the plausibility test, is denoted as "Simulated Statistics". The “Permuted Statistic” refers to the statistics generated through permutation during a specific round of the repeated experiment using the permutation test.
  • ...and 11 more figures

Theorems & Definitions (33)

  • Proposition 0
  • Definition 1: SriGreFukLanetal10sriperumbudur2011universality
  • Proposition 1
  • Proposition 1
  • Theorem 1
  • Theorem 1
  • Proposition 1
  • proof
  • Proposition 1
  • proof
  • ...and 23 more