Table of Contents
Fetching ...

Nullstrap-DE: A General Framework for Calibrating FDR and Preserving Power in DE Methods, with Applications to DESeq2 and edgeR

Chenxin Jiang, Changhu Wang, Jingyi Jessica Li

Abstract

Differential expression (DE) analysis is a key task in RNA-seq studies, aiming to identify genes with expression differences across conditions. A central challenge is balancing false discovery rate (FDR) control with statistical power. Parametric methods such as DESeq2 and edgeR achieve high power by modeling gene-level counts using negative binomial distributions and applying empirical Bayes shrinkage. However, these methods may suffer from FDR inflation when model assumptions are mildly violated, especially in large-sample settings. In contrast, non-parametric tests like Wilcoxon offer more robust FDR control but often lack power and do not support covariate adjustment. We propose Nullstrap-DE, a general add-on framework that combines the strengths of both approaches. Designed to augment tools like DESeq2 and edgeR, Nullstrap-DE calibrates FDR while preserving power, without modifying the original method's implementation. It generates synthetic null data from a model fitted under the gene-specific null (no DE), applies the same test statistic to both observed and synthetic data, and derives a threshold that satisfies the target FDR level. We show theoretically that Nullstrap-DE asymptotically controls FDR while maintaining power consistency. Simulations confirm that it achieves reliable FDR control and high power across diverse settings, where DESeq2, edgeR, or Wilcoxon often show inflated FDR or low power. Applications to real datasets show that Nullstrap-DE enhances statistical rigor and identifies biologically meaningful genes.

Nullstrap-DE: A General Framework for Calibrating FDR and Preserving Power in DE Methods, with Applications to DESeq2 and edgeR

Abstract

Differential expression (DE) analysis is a key task in RNA-seq studies, aiming to identify genes with expression differences across conditions. A central challenge is balancing false discovery rate (FDR) control with statistical power. Parametric methods such as DESeq2 and edgeR achieve high power by modeling gene-level counts using negative binomial distributions and applying empirical Bayes shrinkage. However, these methods may suffer from FDR inflation when model assumptions are mildly violated, especially in large-sample settings. In contrast, non-parametric tests like Wilcoxon offer more robust FDR control but often lack power and do not support covariate adjustment. We propose Nullstrap-DE, a general add-on framework that combines the strengths of both approaches. Designed to augment tools like DESeq2 and edgeR, Nullstrap-DE calibrates FDR while preserving power, without modifying the original method's implementation. It generates synthetic null data from a model fitted under the gene-specific null (no DE), applies the same test statistic to both observed and synthetic data, and derives a threshold that satisfies the target FDR level. We show theoretically that Nullstrap-DE asymptotically controls FDR while maintaining power consistency. Simulations confirm that it achieves reliable FDR control and high power across diverse settings, where DESeq2, edgeR, or Wilcoxon often show inflated FDR or low power. Applications to real datasets show that Nullstrap-DE enhances statistical rigor and identifies biologically meaningful genes.

Paper Structure

This paper contains 26 sections, 8 theorems, 77 equations, 13 figures, 1 algorithm.

Key Result

theorem 1

Assume that $\frac{\log p}{\sqrt{s}} \rightarrow 0$, where $s = \# \mathcal{S}_1$ denotes the number of ground-truth DE genes. Under Assumptions assump:estimation--assump:independence, for a target FDR level $q \in (0,1)$, with the test statistic threshold $\tau_q$ defined in eq:tau_q and the declar where $c_1, c_2 > 0$ are constants. $\blacktriangleleft$$\blacktriangleleft$

Figures (13)

  • Figure 1: Simulation results under two settings.(a, b) Results without covariates: empirical FDR (top) and power (bottom) across sample sizes ($n$) and FDR targets ($q$), with DE proportion $=0.1$. (c, d) Results with covariates, with DE proportion $=0.2$. Across all scenarios, Nullstrap-DESeq2 and Nullstrap-edgeR maintain effective FDR control while achieving high power, outperforming DESeq2, edgeR, and Wilcoxon rank-sum tests.
  • Figure 2: Empirical evaluation of FDR control using permuted negative-control datasets. (a,b) Average and distribution of number of DE genes at FDR $=0.05$ across 1,000 permutations of a real RNA-seq dataset of classical and non-classical human monocytes. DESeq2 and edgeR produce many false positives, whereas Nullstrap-DESeq2 and Nullstrap-edgeR report near-zero DE genes.
  • Figure 3: Comparison of DE methods on RNA-seq data from classical and non-classical human monocytes. (a) Venn diagram showing overlaps of DE genes (FDR $<$ 0.05) identified by DESeq2, edgeR, Wilcoxon, Nullstrap-DESeq2, and Nullstrap-edgeR. (b) MA plots comparing DE genes from Nullstrap-DESeq2 vs. DESeq2 (left) and Nullstrap-edgeR vs. edgeR (right); Nullstrap-DE methods prioritize DE genes with stronger fold changes and higher expression. (c) Ranks of four representative GO terms by enrichment $p$-value for each method, including two immune-related terms (left) and two general cellular functions (right). "Found" indicates that the GO term is enriched (hypergeometric test, BH-adjusted $p$-value $<0.05$), while "Not found" indicates that the GO term is not enriched. Nullstrap-DE methods prioritize immune-specific terms but deprioritize general cellular functions, highlighting their ability to enrich for biologically meaningful signals. (d) GO terms enriched among DE genes upregulated in non-classical (CD14lowCD16+) monocytes. GO term colors indicate their biological relevance to non-classical monocytes, with red, purple, and black denoting highly relevant, relevant, and not relevant, respectively. Nullstrap-edgeR predominantly recovers GO terms that are highly relevant or relevant to non-classical monocyte biology. (e) GO terms enriched among DE genes upregulated in classical (CD14+CD16–) monocytes. GO term colors follow the same scheme as in (e). Nullstrap-edgeR predominantly highlights functions specific to classical monocyte biology.
  • Figure 4: Comparison of DE methods on RNA-seq data from dexamethasone-treated airway smooth muscle cells. (a) Venn diagram of DE genes (FDR $<$ 0.05) identified by five methods. Nullstrap-DESeq2 and Nullstrap-edgeR methods yield fewer DE genes than DESeq2 and edgeR, while the Wilcoxon rank-sum test detects none. (b) MA plots comparing DE genes retained by Nullstrap-DESeq2 vs. DESeq2 (left) and Nullstrap-edgeR vs. edgeR (right); Nullstrap-DE methods prioritize genes with stronger fold changes and higher average expression. (c) Heatmaps of normalized expression for DE genes identified by Nullstrap-edgeR, edgeR, and edgeR only sets. Nullstrap-edgeR DE genes show clearer separation between treated and untreated samples, with hierarchical clustering correctly recovering treatment groups. (d) KEGG pathway enrichment for DE genes upregulated under dexamethasone treatment. KEGG term colors indicate biological relevance to glucocorticoid signaling and airway smooth muscle function: red = highly relevant, purple = relevant, black = not relevant. Nullstrap-DE methods preferentially enrich for biologically meaningful pathways, while DESeq2, edgeR, and their respective "only" sets tend to identify more general or unrelated pathways.
  • Figure 5: Comparison of DE methods on pseudobulk scRNA-seq data from COVID-19 patient monocytes. (a) Venn diagram of DE genes (FDR $<$ 0.1) showing Nullstrap-DE methods yield fewer DE genes than their parent methods. (Note: Nullstrap-DESeq2 detects a small set of DE genes not found by DESeq2; these genes are automatically filtered out by DESeq2’s built-in independent filtering step prior to multiple testing correction, but are retained in Nullstrap-DESeq2 because they still have valid Wald statistics and nominal $p$-values.) (b) MA plots indicate that Nullstrap-DE prioritizes DE genes with stronger fold changes and higher expression. (c) GO enrichment of DE genes in monocytes from mild/moderate vs. severe/critical COVID-19 patients. Nullstrap-DESeq2 recovers immune-specific processes specific to COVID-19 severity (e.g., antigen presentation via MHC class II, peptide–MHC assembly), whereas DESeq2 prioritizes broader categories like cell adhesion, and DESeq2 only genes enrich for less specific terms. (d) Bipartite gene–concept networks show Nullstrap-DESeq2 captures a coherent immune module (e.g., HLA-DRA, HLA-DPB1, HLA-DQA1), while DESeq2 yields a more diffuse network dominated by general adhesion-related genes (ITGA5, LAMC1, FN1, NID1).
  • ...and 8 more figures

Theorems & Definitions (19)

  • definition 1: Synthetic null data generation for DE analysis
  • theorem 1
  • corollary 1: Maximum likelihood estimation guarantees for NB-GLM
  • remark 1
  • remark 2
  • lemma 1: An upper-tail exponent inequality
  • proof
  • lemma 2: A lower-tail exponent inequality
  • proof
  • lemma 3: Multiplicative Chernoff bound
  • ...and 9 more