Table of Contents
Fetching ...

Estimation of the complexity of a network under a Gaussian graphical model

Nabaneet Das, Thorsten Dickhaus

TL;DR

This work proposes an estimator that combines p-values from simultaneous edge-wise tests, conducted under false discovery rate control, with Storey's estimator of the proportion of true null hypotheses, and establishes weak dependence conditions on the precision matrix under which the empirical cumulative distribution function of the p-values converges to its population counterpart.

Abstract

The proportion of edges in a Gaussian graphical model (GGM) characterizes the complexity of its conditional dependence structure. Since edge presence corresponds to a nonzero entry of the precision matrix, estimation of this proportion can be formulated as a large-scale multiple testing problem. We propose an estimator that combines p-values from simultaneous edge-wise tests, conducted under false discovery rate control, with Storey's estimator of the proportion of true null hypotheses. We establish weak dependence conditions on the precision matrix under which the empirical cumulative distribution function of the p-values converges to its population counterpart. These conditions cover high-dimensional regimes, including those arising in genetic association studies. Under such dependence, we characterize the asymptotic bias of the Schweder--Spjøtvoll estimator, showing that it is upward biased and thus slightly underestimates the true edge proportion. Simulation studies across a variety of models confirm accurate recovery of graph complexity.

Estimation of the complexity of a network under a Gaussian graphical model

TL;DR

This work proposes an estimator that combines p-values from simultaneous edge-wise tests, conducted under false discovery rate control, with Storey's estimator of the proportion of true null hypotheses, and establishes weak dependence conditions on the precision matrix under which the empirical cumulative distribution function of the p-values converges to its population counterpart.

Abstract

The proportion of edges in a Gaussian graphical model (GGM) characterizes the complexity of its conditional dependence structure. Since edge presence corresponds to a nonzero entry of the precision matrix, estimation of this proportion can be formulated as a large-scale multiple testing problem. We propose an estimator that combines p-values from simultaneous edge-wise tests, conducted under false discovery rate control, with Storey's estimator of the proportion of true null hypotheses. We establish weak dependence conditions on the precision matrix under which the empirical cumulative distribution function of the p-values converges to its population counterpart. These conditions cover high-dimensional regimes, including those arising in genetic association studies. Under such dependence, we characterize the asymptotic bias of the Schweder--Spjøtvoll estimator, showing that it is upward biased and thus slightly underestimates the true edge proportion. Simulation studies across a variety of models confirm accurate recovery of graph complexity.
Paper Structure (7 sections, 10 theorems, 90 equations, 3 figures, 6 tables, 2 algorithms)

This paper contains 7 sections, 10 theorems, 90 equations, 3 figures, 6 tables, 2 algorithms.

Key Result

Theorem 3.1

Consider the test statistics in TS and the corresponding two-sided p-values $(p_{ij})_{1 \le i < j \le k}$ as defined in p-values. Suppose $F_N(\cdot)$ is the ECDF of the p-values and the average cdf of the p-values, where $F_{ij}(x) = \Pr(p_{ij} \le x)$. Then, under condition (C1) and if $\: \log k = o(\sqrt{n})$, we have

Figures (3)

  • Figure 1: ECDF of p-values for a block-diagonal covariance structure with equicorrelation within blocks($n = 200, k = 500$).
  • Figure 2: Ecdf plot of p-values for Erdős–Rényi random graph with a fixed sparsity ($q = 0.2, n = 200, k = 500$)
  • Figure 3: Empirical CDFs of p-values for the leukemia microarray data of golub1999molecular, shown separately for the ALL and AML mRNA sample groups.

Theorems & Definitions (10)

  • Theorem 3.1
  • Lemma 3.2
  • Corollary 3.2.1
  • Lemma 7.1
  • Lemma 7.2
  • Lemma 7.3
  • Corollary 7.3.1
  • Lemma 7.4
  • Lemma 7.5
  • Corollary 7.5.1