Table of Contents
Fetching ...

hyppo: A Multivariate Hypothesis Testing Python Package

Sambit Panda, Satish Palaniappan, Junhao Xiong, Eric W. Bridgeford, Ronak Mehta, Cencheng Shen, Joshua T. Vogelstein

TL;DR

hyppo addresses the need for a unified, high-power Python platform for multivariate hypothesis testing, covering independence, two-sample, and k-sample problems with a broad suite of state-of-the-art tests. The library offers a consistent API, parallelized permutation inference, and JIT-compiled test statistics, with extensive benchmarks showing competitive performance and close agreement with R implementations. Key contributions include the wide test repertoire (distance- and kernel-based methods, time-series and conditional tests), a modular structure, and open-source availability with documentation. The work enables researchers to perform robust multivariate testing in Python with extensible, well-tested tooling and transparent performance comparisons, supporting rapid application and extension in scientific workflows. p-values are computed via permutation tests, enhancing nonparametric inference across diverse data regimes.

Abstract

We introduce hyppo, a unified library for performing multivariate hypothesis testing, including independence, two-sample, and k-sample testing. While many multivariate independence tests have R packages available, the interfaces are inconsistent and most are not available in Python. hyppo includes many state of the art multivariate testing procedures. The package is easy-to-use and is flexible enough to enable future extensions. The documentation and all releases are available at https://hyppo.neurodata.io.

hyppo: A Multivariate Hypothesis Testing Python Package

TL;DR

hyppo addresses the need for a unified, high-power Python platform for multivariate hypothesis testing, covering independence, two-sample, and k-sample problems with a broad suite of state-of-the-art tests. The library offers a consistent API, parallelized permutation inference, and JIT-compiled test statistics, with extensive benchmarks showing competitive performance and close agreement with R implementations. Key contributions include the wide test repertoire (distance- and kernel-based methods, time-series and conditional tests), a modular structure, and open-source availability with documentation. The work enables researchers to perform robust multivariate testing in Python with extensible, well-tested tooling and transparent performance comparisons, supporting rapid application and extension in scientific workflows. p-values are computed via permutation tests, enhancing nonparametric inference across diverse data regimes.

Abstract

We introduce hyppo, a unified library for performing multivariate hypothesis testing, including independence, two-sample, and k-sample testing. While many multivariate independence tests have R packages available, the interfaces are inconsistent and most are not available in Python. hyppo includes many state of the art multivariate testing procedures. The package is easy-to-use and is flexible enough to enable future extensions. The documentation and all releases are available at https://hyppo.neurodata.io.

Paper Structure

This paper contains 9 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Benchmarks of hyppo implementations against corresponding R implementations for tests in the independence testing module. Average wall times (over 3 repetitions) (left) are shown for Dcorr in energy, Mmd in kernlab, and Hhg in Hhg as compared against hyppo implementations of Mgc, Hhg, Dcorr, Mmd, and Fast Dcorr. Test statistic comparisons (right) between Dcorr, Mmd, and Hhg in hyppo are compared against their respective reference R implementations. Test statistics are nearly identical for each implementation.