When Does Bottom-up Beat Top-down in Hierarchical Community Detection?
Maximilien Dreveton, Daichi Kuroda, Matthias Grossglauser, Patrick Thiran
TL;DR
The paper analyzes hierarchical community detection under Hierarchical Stochastic Block Models, contrasting bottom-up (agglomerative) and top-down (divisive) algorithms. It proves that bottom-up linkage with average-linkage can recover the latent tree under sparse conditions $N\delta_N=\omega(1)$ and can attain exact recovery at intermediate depths up to the information-theoretic threshold, outperforming top-down methods whose guarantees are stricter. A two-stage algorithm combines Bethe-Hessian spectral initialization to identify primitive communities with an edge-density based linkage step, and is shown to be robust to misclustering, including in bounded-degree settings via graph-splitting. Numerical experiments on synthetic BTSBMs and real networks (e.g., high-school contact networks and power grids) corroborate the theoretical advantages, including fewer dendrogram inversions in bottom-up trees. Collectively, the results advance understanding of hierarchical clustering for networks and expand the practical regime where exact hierarchical recovery is feasible.
Abstract
Hierarchical clustering of networks consists in finding a tree of communities, such that lower levels of the hierarchy reveal finer-grained community structures. There are two main classes of algorithms tackling this problem. Divisive (top-down) algorithms recursively partition the nodes into two communities, until a stopping rule indicates that no further split is needed. In contrast, agglomerative (bottom-up) algorithms first identify the smallest community structure and then repeatedly merge the communities using a linkage method. In this article, we establish theoretical guarantees for the recovery of the hierarchical tree and community structure of a Hierarchical Stochastic Block Model by a bottom-up algorithm. We also establish that this bottom-up algorithm attains the information-theoretic threshold for exact recovery at intermediate levels of the hierarchy. Notably, these recovery conditions are less restrictive compared to those existing for top-down algorithms. This shows that bottom-up algorithms extend the feasible region for achieving exact recovery at intermediate levels. Numerical experiments on both synthetic and real data sets confirm the superiority of bottom-up algorithms over top-down algorithms. We also observe that top-down algorithms can produce dendrograms with inversions. These findings contribute to a better understanding of hierarchical clustering techniques and their applications in network analysis.
