Thinking in Groups: Permutation Tests Reveal Near-Out-of-Distribution
Yasith Jayawardana, Dineth Jayakody, Sampath Jayarathna, Dushan N. Wadduwage
TL;DR
This work tackles near-OoD detection in biomedical AI by exploiting within-specimen replication to form homogeneous groups. The authors define homogeneous-OoD (HOoD) and formulate OoD detection as a two-sample exchangeability test across $K$ reference subpopulations, using permutation-based MRPP statistics on latent responses $Z(x;\phi)$. The method outputs per-subpopulation $p$-values and declares InD if $\max_k p_k \ge \alpha$, enabling interpretable, batch-wise OoD assessment without strong distributional assumptions. Empirical results on toy MNIST/CIFAR-10 splits and the AMRB bacteria dataset show that MRPP/LSP-based HOoD outperforms standard point-wise detectors and offers robust near-OoD detection across architectures and datasets, with practical potential for real-world biomedical deployment.
Abstract
Deep neural networks (DNNs) have the potential to power many biomedical workflows, but training them on truly representative, IID datasets is often infeasible. Most models instead rely on biased or incomplete data, making them prone to out-of-distribution (OoD) inputs that closely resemble in-distribution samples. Such near-OoD cases are harder to detect than standard OOD benchmarks and can cause unreliable, even catastrophic, predictions. Biomedical assays, however, offer a unique opportunity: they often generate multiple correlated measurements per specimen through biological or technical replicates. Exploiting this insight, we introduce Homogeneous OoD (HOoD), a novel OoD detection framework for correlated data. HOoD projects groups of correlated measurements through a trained model and uses permutation-based hypothesis tests to compare them with known subpopulations. Each test yields an interpretable p-value, quantifying how well a group matches a subpopulation. By aggregating these p-values, HOoD reliably identifies OoD groups. In evaluations, HOoD consistently outperforms point-wise and ensemble-based OoD detectors, demonstrating its promise for robust real-world deployment.
