E-Valuating Classifier Two-Sample Tests
Teodora Pandeva, Tim Bakker, Christian A. Naesseth, Patrick Forré
TL;DR
This work addresses robust two-sample testing in high-dimensional and sequential settings by introducing E-C2ST, a deep classifier-based test built on E-values that are valid under a null hypothesis via $\mathbb{E}_P[E] \le 1$. By combining split-likelihood ideas and predictive conditional independence testing, the authors derive both batch-wise and sequential E-processes that offer anytime-type-I-error control while improving power through multiple data splits. They establish theoretical guarantees for type I error control and consistency, and propose a practical, bounded E-variable construction with a tunable mixing parameter to stabilize performance. Empirically, E-C2ST outperforms standard p-value-based C2ST baselines across synthetic and real datasets (Blob, KDEF, MNIST) by leveraging information from all batches, with a tractable computational profile and clear guidance on batch size and initialization effects. The approach thus provides a principled, scalable framework for sequential two-sample testing in complex data domains, with potential extensions in online learning and active data selection.
Abstract
We introduce a powerful deep classifier two-sample test for high-dimensional data based on E-values, called E-value Classifier Two-Sample Test (E-C2ST). Our test combines ideas from existing work on split likelihood ratio tests and predictive independence tests. The resulting E-values are suitable for anytime-valid sequential two-sample tests. This feature allows for more effective use of data in constructing test statistics. Through simulations and real data applications, we empirically demonstrate that E-C2ST achieves enhanced statistical power by partitioning datasets into multiple batches beyond the conventional two-split (training and testing) approach of standard classifier two-sample tests. This strategy increases the power of the test while keeping the type I error well below the desired significance level.
