Community Detection with Heterogeneous Block Covariance Model
Xiang Li, Yunpeng Zhao, Qing Pan, Ning Hao
TL;DR
The paper addresses clustering of features by learning a covariance-based community structure with heterogeneity across features. It introduces the heterogeneous block covariance model (HBCM) and an efficient variational EM algorithm that adds a second latent layer to achieve closed-form updates and polynomial-time computation. Theoretical results establish identifiability and consistency of the estimated memberships under mild conditions, and extensive simulations show HBCM outperforms spectral clustering and related methods under diverse settings, including misspecified covariances. The approach is demonstrated on mouse embryo scRNA-seq data and stock prices, yielding improved clustering and clearer interpretation of block structures, with cross-validated selection of the number of communities. Overall, HBCM offers a principled, scalable framework for covariance-based feature clustering with strong theoretical guarantees and practical applicability to biological and financial datasets.
Abstract
Community detection is the task of clustering objects based on their pairwise relationships. Most of the model-based community detection methods, such as the stochastic block model and its variants, are designed for networks with binary (yes/no) edges. In many practical scenarios, edges often possess continuous weights, spanning positive and negative values, which reflect varying levels of connectivity. To address this challenge, we introduce the heterogeneous block covariance model (HBCM) that defines a community structure within the covariance matrix, where edges have signed and continuous weights. Furthermore, it takes into account the heterogeneity of objects when forming connections with other objects within a community. A novel variational expectation-maximization algorithm is proposed to estimate the group membership. The HBCM provides provable consistent estimates of memberships, and its promising performance is observed in numerical simulations with different setups. The model is applied to a single-cell RNA-seq dataset of a mouse embryo and a stock price dataset. Supplementary materials for this article are available online.
