Credal Two-Sample Tests of Epistemic Uncertainty
Siu Lun Chau, Antonin Schrab, Arthur Gretton, Dino Sejdinovic, Krikamol Muandet
TL;DR
This work introduces credal two-sample tests for comparing epistemic uncertainty modeled as credal sets, extending classical two-sample testing to hypotheses about inclusion, equality, and intersection of credal sets. The authors develop a kernel-based, nonparametric framework with permutation-based inference and adaptive sample splitting to control Type I error while handling nuisance parameters. Central contributions include four credal hypotheses, the kernel credal discrepancy (KCD) as a unifying objective, and rigorous theoretical guarantees for specification, inclusion, equality, and plausibility tests. Empirical results on synthetic and semi-synthetic data demonstrate robust Type I control and improved power relative to existing methods, highlighting practical relevance for robust uncertainty quantification and domain generalisation. The work also discusses interpretational and philosophical aspects of probabilities within credal testing, and outlines future directions such as independence testing among credal sets and broader applicability to nonparametric mixture testing.
Abstract
We introduce credal two-sample testing, a new hypothesis testing framework for comparing credal sets -- convex sets of probability measures where each element captures aleatoric uncertainty and the set itself represents epistemic uncertainty that arises from the modeller's partial ignorance. Compared to classical two-sample tests, which focus on comparing precise distributions, the proposed framework provides a broader and more versatile set of hypotheses. This approach enables the direct integration of epistemic uncertainty, effectively addressing the challenges arising from partial ignorance in hypothesis testing. By generalising two-sample test to compare credal sets, our framework enables reasoning for equality, inclusion, intersection, and mutual exclusivity, each offering unique insights into the modeller's epistemic beliefs. As the first work on nonparametric hypothesis testing for comparing credal sets, we focus on finitely generated credal sets derived from i.i.d. samples from multiple distributions -- referred to as credal samples. We formalise these tests as two-sample tests with nuisance parameters and introduce the first permutation-based solution for this class of problems, significantly improving existing methods. Our approach properly incorporates the modeller's epistemic uncertainty into hypothesis testing, leading to more robust and credible conclusions, with kernel-based implementations for real-world applications.
