Estimating Gaussian graphical models of multi-study data with Multi-Study Factor Analysis

Katherine H. Shutta; Denise M. Scholtens; William L. Lowe; Raji Balasubramanian; Roberta De Vito

Estimating Gaussian graphical models of multi-study data with Multi-Study Factor Analysis

Katherine H. Shutta, Denise M. Scholtens, William L. Lowe, Raji Balasubramanian, Roberta De Vito

TL;DR

MSFA-X extends multi-study factor analysis to jointly estimate shared and study-specific Gaussian graphical models, enabling latent-variable–driven network inference across conditions. By decomposing covariance into shared and study-specific components and employing an ECM-based estimation with empirical edge significance, MSFA-X provides a tuning-free approach to uncover direct dependencies in multi-study data. In the HAPO metabolomics context, MSFA-X reveals distinct network patterns between gestational diabetes and non-GDM and links latent factors to newborn adiposity, demonstrating its utility for integrating latent structure with network analysis in biomedical data. The work highlights robust performance in simulations and offers a publicly available implementation, with implications for understanding complex metabolic regulation across conditions. The approach advances differential network analysis by separating shared and condition-specific signals rather than treating networks as monolithic across studies.

Abstract

Network models are powerful tools for gaining new insights from complex biological data. Most lines of investigation in biology involve comparing datasets in the setting where the same predictors are measured across multiple studies or conditions (multi-study data). Consequently, the development of statistical tools for network modeling of multi-study data is a highly active area of research. Multi-study factor analysis (MSFA) is a method for estimation of latent variables (factors) in multi-study data. In this work, we generalize MSFA by adding the capacity to estimate Gaussian graphical models (GGMs). Our new tool, MSFA-X, is a framework for latent variable-based graphical modeling of shared and study-specific signals in multi-study data. We demonstrate through simulation that MSFA-X can recover shared and study-specific GGMs and outperforms a graphical lasso benchmark. We apply MSFA-X to analyze maternal response to an oral glucose tolerance test in targeted metabolomic profiles from the Hyperglycemia and Adverse Pregnancy Outcomes (HAPO) Study, identifying network-level differences in glucose metabolism between women with and without gestational diabetes mellitus.

Estimating Gaussian graphical models of multi-study data with Multi-Study Factor Analysis

TL;DR

Abstract

Paper Structure (21 sections, 20 equations, 21 figures, 6 tables)

This paper contains 21 sections, 20 equations, 21 figures, 6 tables.

Introduction
The Hyperglycemia and Adverse Pregnancy Outcomes (HAPO) Study
Methods
Assessing edge significance with empirical p-values
Identifiability Considerations
Determining the number of factors
Simulation Studies
Application: Metabolomic Networks of Gestational Diabetes in the HAPO Study
Discussion
Acknowledgements
Funding
Supplement to Simulation Studies
Estimating the Number of Factors
Numerical Results
Visual Results
...and 6 more sections

Figures (21)

Figure 1: An illustration of the covariance decomposition implied by the MSFA-X model formulation for $S=2$ studies on $p=12$ predictors.
Figure 1: Matrix RV coefficient for the glasso benchmark, MSFA-X with estimated factor count, and MSFA-X with true factor count. Setting 3 is shown in a separate plot because the number of studies is different than the other settings.
Figure 2: A visual of the region of non-identifiability for one predictor in the case of three studies ($S=3$). $\gamma$: true, unknown shared noise for this predictor; $\hat{\psi}_s$: estimated overall noise; $\hat{\gamma}$: estimated shared noise; $\hat{\eta}_s$: estimated study-specific noise; $\Delta\gamma$: error in estimating $\gamma$.
Figure 2: Cosine similarity for the glasso benchmark, MSFA-X with estimated factor count, and MSFA-X with true factor count. Setting 3 is shown in a separate plot because the number of studies is different than the other settings.
Figure 3: Mean estimated GGM adjacency matrices across 100 simulated datasets for simulation setting 1 (baseline) and setting 4 (increased factor count).
...and 16 more figures

Estimating Gaussian graphical models of multi-study data with Multi-Study Factor Analysis

TL;DR

Abstract

Estimating Gaussian graphical models of multi-study data with Multi-Study Factor Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (21)