Table of Contents
Fetching ...

Statistical Inference on High Dimensional Gaussian Graphical Regression Models

Xuran Meng, Jingfei Zhang, Yi Li

TL;DR

This work addresses statistical inference in high-dimensional Gaussian graphical regression where the precision matrix depends on covariates. It introduces Segmentally Adjusted Graphical Regression Estimator (SAGE) with a projection-based debiasing step that yields asymptotically normal estimators and valid confidence intervals, while greatly reducing computational cost by solving a problem in $\mathbb{R}^n$ instead of the original $(p-1)(q+1)$-dimensional space. Theoretical results establish a per-node debiasing decomposition $\sqrt{n}(\widehat{\bm{\beta}}_j^{u}-\bm{\beta}_j) = \bm{\Delta}_j + w_j$ with controlled error and a normal limit, enabling hypothesis tests and confidence intervals for contrasts. Numerical experiments show improved bias and coverage over undebiased estimators, and a real-data analysis on glioblastoma gene expression demonstrates biologically meaningful SNP-modulated edges. Overall, the approach provides a scalable, statistically valid toolkit for inference in covariate-modulated graphical models with potential applications in genomics and precision medicine.

Abstract

Gaussian graphical regressions have emerged as a powerful approach for regressing the precision matrix of a Gaussian graphical model on covariates, which, unlike traditional Gaussian graphical models, can help determine how graphs are modulated by high dimensional subject-level covariates, and recover both the population-level and subject-level graphs. To fit the model, a multi-task learning approach achieves lower error rates compared to node-wise regressions. However, due to the high complexity and dimensionality of the Gaussian graphical regression problem, the important task of statistical inference remains unexplored. We propose a class of debiased estimators based on multi-task learners for statistical inference in Gaussian graphical regressions. We show that debiasing can be performed quickly and separately for the multi-task learners. In a key debiasing step that estimates the inverse covariance matrix, we propose a novel projection technique that dramatically reduces computational costs in optimization to scale only with the sample size $n$. We show that our debiased estimators enjoy a fast convergence rate and asymptotically follow a normal distribution, enabling valid statistical inference such as constructing confidence intervals and performing hypothesis testing. Simulation studies confirm the practical utility of the proposed approach, and we further apply it to analyze gene co-expression graph data from a brain cancer study, revealing meaningful biological relationships.

Statistical Inference on High Dimensional Gaussian Graphical Regression Models

TL;DR

This work addresses statistical inference in high-dimensional Gaussian graphical regression where the precision matrix depends on covariates. It introduces Segmentally Adjusted Graphical Regression Estimator (SAGE) with a projection-based debiasing step that yields asymptotically normal estimators and valid confidence intervals, while greatly reducing computational cost by solving a problem in instead of the original -dimensional space. Theoretical results establish a per-node debiasing decomposition with controlled error and a normal limit, enabling hypothesis tests and confidence intervals for contrasts. Numerical experiments show improved bias and coverage over undebiased estimators, and a real-data analysis on glioblastoma gene expression demonstrates biologically meaningful SNP-modulated edges. Overall, the approach provides a scalable, statistically valid toolkit for inference in covariate-modulated graphical models with potential applications in genomics and precision medicine.

Abstract

Gaussian graphical regressions have emerged as a powerful approach for regressing the precision matrix of a Gaussian graphical model on covariates, which, unlike traditional Gaussian graphical models, can help determine how graphs are modulated by high dimensional subject-level covariates, and recover both the population-level and subject-level graphs. To fit the model, a multi-task learning approach achieves lower error rates compared to node-wise regressions. However, due to the high complexity and dimensionality of the Gaussian graphical regression problem, the important task of statistical inference remains unexplored. We propose a class of debiased estimators based on multi-task learners for statistical inference in Gaussian graphical regressions. We show that debiasing can be performed quickly and separately for the multi-task learners. In a key debiasing step that estimates the inverse covariance matrix, we propose a novel projection technique that dramatically reduces computational costs in optimization to scale only with the sample size . We show that our debiased estimators enjoy a fast convergence rate and asymptotically follow a normal distribution, enabling valid statistical inference such as constructing confidence intervals and performing hypothesis testing. Simulation studies confirm the practical utility of the proposed approach, and we further apply it to analyze gene co-expression graph data from a brain cancer study, revealing meaningful biological relationships.

Paper Structure

This paper contains 8 sections, 3 theorems, 17 equations, 4 figures, 4 tables.

Key Result

Proposition 3.1

If $\widehat{{\bm{\theta}}}_{jl}$ is a solution of eq:debias_estimate_inverse, then $\widehat{\mathbf{m}}_{jl} = \mathbf{V}_j \widehat{{\bm{\theta}}}_{jl}$ is a solution of eq:debias_estimate_inverse_m. Inversely, if $\widehat{\mathbf{m}}_{jl}$ is the solution of eq:debias_estimate_inverse_m, then $

Figures (4)

  • Figure 1: Histograms of pre-debiased estimates (referred to as Pre) and SAGE estimates (referred to as Post) for ${\bm{\beta}}_{\text{ind}_1}$ with varying $p$ and $q$.
  • Figure 2: The example figures demonstrate the asymptotic standard normal distribution behavior of the standardized SAGE estimates, $\sqrt{n}(\mathbf{A}\widehat{\mathbf{M}}_j^\top\widehat{\bm{\Sigma}}_{\mathbf{W}_j}\widehat{\mathbf{M}}_j\mathbf{A}^\top)^{-1/2}(\mathbf{A}\widehat{\bm{\beta}}_1-\mathbf{A}{\bm{\beta}}_1)$, in the cases with $(p,q)=(120,20)$. Figures \ref{['fig:sub_qqplot1']}, \ref{['fig:sub_qqplot2']} and \ref{['fig:sub_qqplot3']} present QQ plots for the Case I, Case II and Case III, respectively, illustrating the asymptotic normality. Figure \ref{['fig:sub_qqplot4']} shows the joint distribution of two asymptotically independent standard normal variables under Case IV.
  • Figure 3: Population-level gene co-expression graph (left), shown with significance levels of 0.05 (middle) and 0.001 (right). Positive partial correlations are shown with red dashed lines, while negative correlations are indicated by black solid lines.
  • Figure 4: The first, second and third rows display the SNP effects for "rs10509346", "rs1347069" and "rs759950", respectively, at significance levels of 0.05 (middle) and 0.001 (right). Positive partial correlations are shown with red dashed lines, while negative correlations are indicated by black solid lines

Theorems & Definitions (3)

  • Proposition 3.1
  • Theorem 3.5
  • Corollary 3.6