permApprox: a general framework for accurate permutation p-value approximation

Stefanie Peschel; Anne-Laure Boulesteix; Erika von Mutius; Christian L. Müller

permApprox: a general framework for accurate permutation p-value approximation

Stefanie Peschel, Anne-Laure Boulesteix, Erika von Mutius, Christian L. Müller

TL;DR

This work introduces a method for accurate and zero-free p-value approximation in permutation testing, embedded in the permApprox workflow and R package and enforces a support constraint during parameter estimation to ensure valid extrapolation beyond the observed statistic, thereby strictly avoiding zero p-values.

Abstract

Permutation procedures are common practice in hypothesis testing when distributional assumptions about the test statistic are not met or unknown. With only few permutations, empirical p-values lie on a coarse grid and may even be zero when the observed test statistic exceeds all permuted values. Such zero p-values are statistically invalid and hinder multiple testing correction. Parametric tail modeling with the Generalized Pareto Distribution (GPD) has been proposed to address this issue, but existing implementations can again yield zero p-values when the estimated shape parameter is negative and the fitted distribution has a finite upper bound. We introduce a method for accurate and zero-free p-value approximation in permutation testing, embedded in the permApprox workflow and R package. Building on GPD tail modeling, the method enforces a support constraint during parameter estimation to ensure valid extrapolation beyond the observed statistic, thereby strictly avoiding zero p-values. The workflow further integrates robust parameter estimation, data-driven threshold selection, and principled handling of hybrid p-values that are discrete in the bulk and continuous in the extreme tail. Extensive simulations using two-sample t-tests and Wilcoxon rank-sum tests show that permApprox produces accurate, robust, and zero-free p-value approximations across a wide range of sample and effect sizes. Applications to single-cell RNA-seq and microbiome data demonstrate its practical utility: permApprox yields smooth and interpretable p-value distributions even with few permutations. By resolving the zero-p-value problem while preserving accuracy and computational efficiency, permApprox enables reliable permutation-based inference in high-dimensional and computationally intensive settings.

permApprox: a general framework for accurate permutation p-value approximation

TL;DR

Abstract

Paper Structure (47 sections, 30 equations, 32 figures, 10 tables)

This paper contains 47 sections, 30 equations, 32 figures, 10 tables.

Introduction
Permutation p-value approximation
Permutation testing and empirical p-values
GPD-based p-value approximation
Parametrization note.
Proposed constrained GPD fitting
Data-adaptive selection of the safety margin epsilon
The permApprox workflow
Overview and scope of the workflow
Screening for tail approximation
GPD threshold selection strategies
GPD parameter estimation methods
Default permApprox configuration and implementation
Remarks on single and multiple testing
Simulation studies
...and 32 more sections

Figures (32)

Figure 1: GPD-based $p$-value approximation.Left panel: Distribution of the permuted test statistic $T^*$ with threshold $u$ defining the permutation excesses (orange), and observed test statistic $T_{\mathrm{obs}}$. Upper right: Standard (unconstrained) GPD tail approximation when the estimated shape parameter is negative ($\hat{\xi}_{(U)}<0$), resulting in a finite upper support boundary $\hat{s}_{(U)} = -\frac{\hat{\sigma}_{(U)}}{\hat{\xi}_{(U)}}$. If the observed excess $Y_{\mathrm{obs}}$ lies beyond this boundary, the tail approximation assigns zero probability mass, yielding a zero GPD-based $p$-value. Lower right: Proposed constrained GPD fit, enforcing that the upper support boundary $\hat{s}_{(C)}$ lies strictly above the evaluation point, with offset $\varepsilon$. This guarantees a strictly positive tail probability at $Y_{\mathrm{obs}}$.
Figure 2: Graphical representation of the permApprox workflow. Input: A vector with observed test statistics $T_{\mathrm{obs}}$ and a matrix with permuted test statistics $\mathcal{T} = {T_b^{*(j)}}$ for $B$ permutations and $m$ tests. Step 1: Empirical permutation $p$-values are computed and screened for small values relative to the chosen significance level $\alpha$ (i.e., $p_{\mathrm{emp}} < 2\alpha$), with selected tests flagged for GPD-based tail approximation. Step 2: For a selected test $j$, the tail approximation stage consists of three main steps: Step 2a: A threshold $u^{(j)}$ is chosen via the Anderson Darling (AD) based test to define the tail region of the permutation distribution. Step 2b: The safety margin $\varepsilon^{(j)}$ is selected (with optional refinement in rare cases of machine-underflow). Step 2c: A GPD is fitted to the permutation excesses under a support constraint ensuring valid evaluation beyond the observed excess. The fitted constrained GPD yields refined tail $p$-values $p_{\mathrm{cGPD}}^{(j)}$, which replace the corresponding empirical values. Output: Vector of hybrid $p$-values that are discrete in the bulk of the distribution and continuous in the extreme tail. The red flags mark our suggested methods as explained in Section \ref{['sec:methods:default_config']}
Figure 3: Two-sample t-test with Gaussian data: Ratios of approximated to ground-truth $p$-values: $p_{\text{method}} / p_{t\text{-test}}$, stratified by sample size $n$. The horizontal dashed line at 1 indicates perfect agreement with the t-test reference. Effect size and number of permutations are fixed to $d=1$ and $B=1000$, respectively. Points indicate individual replicates (1000 per setting). Counts of exact zeros are shown below each group as "0s=x" in the corresponding color. For plotting only, zero $p$-values are mapped to a small constant floor so they appear at "0" on the $y$-axis. The y-axis is on a log$_{10}$ scale, while tick labels show the original ratios.
Figure 4: Wilcoxon rank-sum test with exponential data: Comparison of $p$-value approximations to the Wilcoxon reference. (a) Illustrative simulation scenario with 1000 tests across five effect sizes ($d \in \{0, 0.5, 1, 1.5, 2\}$) at $n=250$ samples per group. The x-axis shows the observed Mann-Whitney $U$ statistic, and points represent individual tests. The y-axis displays $p$-values on a log$_{10}$ scale, while tick labels show the original $p$-values. (b) Simulation study with 1000 independent replicates per sample size ($d=1$, $B=1000$). The x-axis indicates the per-group sample size, and the y-axis shows ratios of approximated to Wilcoxon $p$-values ($p_{\text{method}} / p_{\text{Wilcoxon}}$) on a log$_{10}$ scale (tick labels show the original ratios). The horizontal dashed line at 1 indicates agreement with the Wilcoxon reference. Points represent individual replicates (1000 per setting). In panel (b), counts of exact zero $p$-values produced by each method are shown below each group as "0s=x" in the corresponding color. For visualization only, zero $p$-values are mapped to a small positive constant so they appear at the lower plotting boundary.
Figure 5: Differential abundance analysis in the PASTURE cohort: effect of permutation budget and comparison to permApprox. Volcano plots show log$_2$ fold-changes (EBF vs. non-EBF) versus raw (unadjusted) permutation $p$-values for genus-level differential abundance testing with dacomp. Orange points indicate genera enriched in the EBF group and blue points indicate genera enriched in the non-EBF group. The horizontal dashed line marks the significance level $\alpha=0.01$, and points shown in gray are not significant at this level. (a)dacomp with $B=10^3$ permutations: empirical permutation $p$-values are bounded below by approximately $1/1000$, resulting in a pronounced floor in the tail. (b)dacomp with $B=10^6$ permutations and (c)dacomp with $B=10^7$ permutations: increasing the permutation budget progressively refines tail resolution. (d)dacomp with $B=10^3$ permutations followed by permApprox (default configuration), yielding refined tail $p$-values that largely reproduce the trends observed at $B=10^6$--$10^7$ at a fraction of the computational cost. For readability, only taxa that attain a raw $p$-value below $10^{-5}$ in at least one of the three right-most panels ((b)--(d)) are labeled. In panel (d), taxa whose permApprox$p$-values follow the trend observed from $B=10^6$ to $B=10^7$ are indicated by a black outline, whereas taxa that deviate from this trend are highlighted by a red outline. Panel headers report runtimes for the respective analyses.
...and 27 more figures

permApprox: a general framework for accurate permutation p-value approximation

TL;DR

Abstract

permApprox: a general framework for accurate permutation p-value approximation

Authors

TL;DR

Abstract

Table of Contents

Figures (32)