Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering
Laxman Dhulipala, Xiaojun Dong, Kishen N Gowda, Yan Gu
TL;DR
This work tackles the problem of computing the single-linkage dendrogram (SLD) for edge-weighted trees in parallel, improving upon prior Ω($n\log n$) work barriers with deterministic, output-sensitive approaches. It introduces a merge-based divide-and-conquer framework using SLD-Merge, and instantiates it via a theoretically optimal tree-contraction method (SLD-TreeContraction) plus two practical algorithms, ParUF and RC-Tree Tracing (RCTT). The two main theoretical results show O($n\log h$) work with polylogarithmic depth for the optimal approach, and O($n\log h$) work with practical, strong-depth guarantees for the online and tracing variants, respectively; experiments demonstrate up to 150x speedups over a well-optimized sequential baseline on billion-scale trees. The practical impact is significant: deterministic, scalable dendrogram construction on massive trees enables faster hierarchical clustering and downstream analytics in domains like biology, image analysis, and networks.
Abstract
Computing a Single-Linkage Dendrogram (SLD) is a key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree $T$, the SLD of $T$ is a binary dendrogram that summarizes the $n-1$ clusterings obtained by contracting the edges of $T$ in order of weight. Existing algorithms for computing the SLD all require $Ω(n\log n)$ work where $n = |T|$. Furthermore, to the best of our knowledge no prior work provides a parallel algorithm obtaining non-trivial speedup for this problem. In this paper, we design faster parallel algorithms for computing SLDs both in theory and in practice based on new structural results about SLDs. In particular, we obtain a deterministic output-sensitive parallel algorithm based on parallel tree contraction that requires $O(n \log h)$ work and $O(\log^2 n \log^2 h)$ depth, where $h$ is the height of the output SLD. We also give a deterministic bottom-up algorithm for the problem inspired by the nearest-neighbor chain algorithm for hierarchical agglomerative clustering, and show that it achieves $O(n\log h)$ work and $O(h \log n)$ depth. Our results are based on a novel divide-and-conquer framework for building SLDs, inspired by divide-and-conquer algorithms for Cartesian trees. Our new algorithms can quickly compute the SLD on billion-scale trees, and obtain up to 150x speedup over the highly-efficient Union-Find algorithm typically used to compute SLDs in practice.
