Simultaneous Identification of Sparse Structures and Communities in Heterogeneous Graphical Models
Dapeng Shi, Tiandong Wang, Zhiliang Ying
TL;DR
The paper introduces a sparse plus low-rank diagonal-block decomposition of the residual precision matrix in Gaussian graphical models to simultaneously identify sparse edges and non-overlapped communities. It proposes a three-stage estimation procedure—LS-based regression, adaptive-$\ell_1$ penalized estimation for $S$ and $L$, and K-means clustering on the latent-community rows—along with an ADMM algorithm for efficient computation and data-driven tuning. Theoretical contributions include identifiability via tangent-space analysis and an adaptive irrepresentability condition ensuring model-selection consistency, plus a clustering error bound for the final stage. Empirical results on synthetic data and stock market data demonstrate superior performance in recovering community structure and edges, with practical implications for genetics, neuroscience, finance, and beyond.
Abstract
Exploring and detecting community structures hold significant importance in genetics, social sciences, neuroscience, and finance. Especially in graphical models, community detection can encourage the exploration of sets of variables with group-like properties. In this paper, within the framework of Gaussian graphical models, we introduce a novel decomposition of the underlying graphical structure into a sparse part and low-rank diagonal blocks (non-overlapped communities). We illustrate the significance of this decomposition through two modeling perspectives and propose a three-stage estimation procedure with a fast and efficient algorithm for the identification of the sparse structure and communities. Also on the theoretical front, we establish conditions for local identifiability and extend the traditional irrepresentability condition to an adaptive form by constructing an effective norm, which ensures the consistency of model selection for the adaptive $\ell_1$ penalized estimator in the second stage. Moreover, we also provide the clustering error bound for the K-means procedure in the third stage. Extensive numerical experiments are conducted to demonstrate the superiority of the proposed method over existing approaches in estimating graph structures. Furthermore, we apply our method to the stock return data, revealing its capability to accurately identify non-overlapped community structures.
