Clusterpath Gaussian Graphical Modeling
D. J. W. Touw, A. Alfons, P. J. F. Groenen, I. Wilms
TL;DR
The paper tackles the problem of high estimation variability in Gaussian Graphical Models by introducing CGGM, a convex clusterpath-based estimator that jointly learns a clustering of variables and the GGM parameters. By representing the precision matrix as Θ = U R U^T + A and penalizing inter-cluster differences with an aggregation term, CGGM induces a block-structured Θ whose inverse preserves the same block structure, while optionally incorporating a sparsity penalty. The authors develop a cyclic block coordinate descent algorithm with fusion steps and Newton updates, provide tuning and refitting procedures, and demonstrate strong performance in simulations across varied designs. They further show CGGM’s versatility through applications to finance, well-being indicators, and psychometrics, and discuss extending the framework to clustered covariance estimation. The work offers a practical, scalable approach to interpretable high-dimensional GGMs and provides software CGGMR for implementation.
Abstract
Graphical models serve as effective tools for visualizing conditional dependencies between variables. However, as the number of variables grows, interpretation becomes increasingly difficult, and estimation uncertainty increases due to the large number of parameters relative to the number of observations. To address these challenges, we introduce the Clusterpath estimator of the Gaussian Graphical Model (CGGM) that encourages variable clustering in the graphical model in a data-driven way. Through the use of an aggregation penalty, we group variables together, which in turn results in a block-structured precision matrix whose block structure remains preserved in the covariance matrix. The CGGM estimator is formulated as the solution to a convex optimization problem, making it easy to incorporate other popular penalization schemes which we illustrate through the combination of an aggregation and sparsity penalty. We present a computationally efficient implementation of the CGGM estimator by using a cyclic block coordinate descent algorithm. In simulations, we show that CGGM not only matches, but oftentimes outperforms other state-of-the-art methods for variable clustering in graphical models. We also demonstrate CGGM's practical advantages and versatility on a diverse collection of empirical applications.
