Estimating Gaussian graphical models of multi-study data with Multi-Study Factor Analysis
Katherine H. Shutta, Denise M. Scholtens, William L. Lowe, Raji Balasubramanian, Roberta De Vito
TL;DR
MSFA-X extends multi-study factor analysis to jointly estimate shared and study-specific Gaussian graphical models, enabling latent-variable–driven network inference across conditions. By decomposing covariance into shared and study-specific components and employing an ECM-based estimation with empirical edge significance, MSFA-X provides a tuning-free approach to uncover direct dependencies in multi-study data. In the HAPO metabolomics context, MSFA-X reveals distinct network patterns between gestational diabetes and non-GDM and links latent factors to newborn adiposity, demonstrating its utility for integrating latent structure with network analysis in biomedical data. The work highlights robust performance in simulations and offers a publicly available implementation, with implications for understanding complex metabolic regulation across conditions. The approach advances differential network analysis by separating shared and condition-specific signals rather than treating networks as monolithic across studies.
Abstract
Network models are powerful tools for gaining new insights from complex biological data. Most lines of investigation in biology involve comparing datasets in the setting where the same predictors are measured across multiple studies or conditions (multi-study data). Consequently, the development of statistical tools for network modeling of multi-study data is a highly active area of research. Multi-study factor analysis (MSFA) is a method for estimation of latent variables (factors) in multi-study data. In this work, we generalize MSFA by adding the capacity to estimate Gaussian graphical models (GGMs). Our new tool, MSFA-X, is a framework for latent variable-based graphical modeling of shared and study-specific signals in multi-study data. We demonstrate through simulation that MSFA-X can recover shared and study-specific GGMs and outperforms a graphical lasso benchmark. We apply MSFA-X to analyze maternal response to an oral glucose tolerance test in targeted metabolomic profiles from the Hyperglycemia and Adverse Pregnancy Outcomes (HAPO) Study, identifying network-level differences in glucose metabolism between women with and without gestational diabetes mellitus.
