Table of Contents
Fetching ...

False Discovery Rate Control for Gaussian Graphical Models via Neighborhood Screening

Taulant Koka, Jasin Machkour, Michael Muma

TL;DR

A nodewise variable selection approach to graph learning is introduced and provably control the false discovery rate of the selected edge set at a self-estimated level and a novel fusion method of the individual neighborhoods outputs an undirected graph estimate.

Abstract

Gaussian graphical models emerge in a wide range of fields. They model the statistical relationships between variables as a graph, where an edge between two variables indicates conditional dependence. Unfortunately, well-established estimators, such as the graphical lasso or neighborhood selection, are known to be susceptible to a high prevalence of false edge detections. False detections may encourage inaccurate or even incorrect scientific interpretations, with major implications in applications, such as biomedicine or healthcare. In this paper, we introduce a nodewise variable selection approach to graph learning and provably control the false discovery rate of the selected edge set at a self-estimated level. A novel fusion method of the individual neighborhoods outputs an undirected graph estimate. The proposed method is parameter-free and does not require tuning by the user. Benchmarks against competing false discovery rate controlling methods in numerical experiments considering different graph topologies show a significant gain in performance.

False Discovery Rate Control for Gaussian Graphical Models via Neighborhood Screening

TL;DR

A nodewise variable selection approach to graph learning is introduced and provably control the false discovery rate of the selected edge set at a self-estimated level and a novel fusion method of the individual neighborhoods outputs an undirected graph estimate.

Abstract

Gaussian graphical models emerge in a wide range of fields. They model the statistical relationships between variables as a graph, where an edge between two variables indicates conditional dependence. Unfortunately, well-established estimators, such as the graphical lasso or neighborhood selection, are known to be susceptible to a high prevalence of false edge detections. False detections may encourage inaccurate or even incorrect scientific interpretations, with major implications in applications, such as biomedicine or healthcare. In this paper, we introduce a nodewise variable selection approach to graph learning and provably control the false discovery rate of the selected edge set at a self-estimated level. A novel fusion method of the individual neighborhoods outputs an undirected graph estimate. The proposed method is parameter-free and does not require tuning by the user. Benchmarks against competing false discovery rate controlling methods in numerical experiments considering different graph topologies show a significant gain in performance.
Paper Structure (9 sections, 1 theorem, 2 equations, 2 figures, 2 algorithms)

This paper contains 9 sections, 1 theorem, 2 equations, 2 figures, 2 algorithms.

Key Result

Theorem 1

Let $\hat{\alpha}:=p/(R\vee1)$. Algorithm algo:unsymmetric controls the $\mathop{\mathrm{\mathrm{FDR}}}\nolimits$ at the estimated level $\hat{\alpha}$, i.e., $\mathop{\mathrm{\mathrm{FDR}}}\nolimits = \mathop{\mathrm{\mathbb{E}}}\nolimits[\mathop{\mathrm{\mathrm{FDP}}}\nolimits] \leq \hat{\alpha}$.

Figures (2)

  • Figure 1: Comparison of the performance of Algorithm \ref{['algo:unsymmetric']} and \ref{['algo:symmetric']} on an ER graph with an edge probability of $10\%$ and partial correlations $|\rho_{ij}|\in[0.2,0.6]$. The sample size varies between $400$ and $1500$. Both methods show a similar performance, except for a slightly smaller achieved $\mathop{\mathrm{\mathrm{FDR}}}\nolimits$ of the undirected graph estimator.
  • Figure 2: Comparison of the performance of Algorithm \ref{['algo:symmetric']}, BH, KO and KO2 on an ER graph with an edge probability of $10\%$(left), a sub-linear preferential attachment graph with growth constant $m=5$ and a power law exponent of $0.5$ (middle), and a small-world graph with $2D=10$ neighbors per node and a rewiring probability of $0.5$ (right). Compared to the competing methods, the proposed method shows a significant gain in performance in all experiments.

Theorems & Definitions (2)

  • Theorem 1
  • proof