Adaptive cut reveals multiscale complexity in networks
Louis Boucherie, Yong-Yeol Ahn, Sune Lehmann
TL;DR
The paper tackles the limitation of single-level cuts in hierarchical clustering by introducing an adaptive cut that uses multi-level dendrogram cuts optimized via Markov chain Monte Carlo with simulated annealing. It couples this approach with a new balancedness metric, B, based on entropy that predicts when multi-level cuts will outperform single cuts. Across synthetic and real networks, including extension to Louvain to produce full dendrograms, the adaptive cut improves partition density and modularity, especially in unbalanced trees, and proves broadly applicable to various clustering tasks. The work provides code, formal definitions, and proofs, offering a robust, adaptable tool for multiscale clustering in networks and beyond.
Abstract
Hierarchical clustering and community detection are important problems in machine learning and complex network analysis. A common approach to identify clusters is to simply cut dendrograms at some threshold. However, single-level cuts are often suboptimal in terms of capturing underlying structure in the data, especially when the dendrogram is unbalanced. In this paper, we present the adaptive cut, a novel method that leverages the hierarchical structure of dendrograms by employing multi-level cuts to overcome the limitations of single-level approaches. The adaptive cut optimizes an objective function using a Markov chain Monte Carlo with simulated annealing, resulting in better partitions. We demonstrate the effectiveness of the adaptive cut through applications to link clustering and modularity optimization, but note that the method is applicable to any clustering task that relies on a dendrogram and an objective function. Beyond the adaptive cut, we introduce the balancedness score, an information-theoretic metric that quantifies how balanced a dendrogram is. Balancedness predicts the potential benefits of using multi-level cuts. For the community detection examples, we evaluate our method on more than 200 real-world networks and multiple synthetic datasets, demonstrating significant improvements in partition density and modularity over traditional single-cut approaches. In addition, we show the generality of the adaptive cut by applying it across various hierarchical clustering techniques and objective functions. Our results indicate that the adaptive cut provides a robust and versatile tool for improving clustering outcomes.
