Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering

Laxman Dhulipala; Xiaojun Dong; Kishen N Gowda; Yan Gu

Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering

Laxman Dhulipala, Xiaojun Dong, Kishen N Gowda, Yan Gu

TL;DR

This work tackles the problem of computing the single-linkage dendrogram (SLD) for edge-weighted trees in parallel, improving upon prior Ω($n\log n$) work barriers with deterministic, output-sensitive approaches. It introduces a merge-based divide-and-conquer framework using SLD-Merge, and instantiates it via a theoretically optimal tree-contraction method (SLD-TreeContraction) plus two practical algorithms, ParUF and RC-Tree Tracing (RCTT). The two main theoretical results show O($n\log h$) work with polylogarithmic depth for the optimal approach, and O($n\log h$) work with practical, strong-depth guarantees for the online and tracing variants, respectively; experiments demonstrate up to 150x speedups over a well-optimized sequential baseline on billion-scale trees. The practical impact is significant: deterministic, scalable dendrogram construction on massive trees enables faster hierarchical clustering and downstream analytics in domains like biology, image analysis, and networks.

Abstract

Computing a Single-Linkage Dendrogram (SLD) is a key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree $T$, the SLD of $T$ is a binary dendrogram that summarizes the $n-1$ clusterings obtained by contracting the edges of $T$ in order of weight. Existing algorithms for computing the SLD all require $Ω(n\log n)$ work where $n = |T|$. Furthermore, to the best of our knowledge no prior work provides a parallel algorithm obtaining non-trivial speedup for this problem. In this paper, we design faster parallel algorithms for computing SLDs both in theory and in practice based on new structural results about SLDs. In particular, we obtain a deterministic output-sensitive parallel algorithm based on parallel tree contraction that requires $O(n \log h)$ work and $O(\log^2 n \log^2 h)$ depth, where $h$ is the height of the output SLD. We also give a deterministic bottom-up algorithm for the problem inspired by the nearest-neighbor chain algorithm for hierarchical agglomerative clustering, and show that it achieves $O(n\log h)$ work and $O(h \log n)$ depth. Our results are based on a novel divide-and-conquer framework for building SLDs, inspired by divide-and-conquer algorithms for Cartesian trees. Our new algorithms can quickly compute the SLD on billion-scale trees, and obtain up to 150x speedup over the highly-efficient Union-Find algorithm typically used to compute SLDs in practice.

Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering

TL;DR

This work tackles the problem of computing the single-linkage dendrogram (SLD) for edge-weighted trees in parallel, improving upon prior Ω(

) work barriers with deterministic, output-sensitive approaches. It introduces a merge-based divide-and-conquer framework using SLD-Merge, and instantiates it via a theoretically optimal tree-contraction method (SLD-TreeContraction) plus two practical algorithms, ParUF and RC-Tree Tracing (RCTT). The two main theoretical results show O(

) work with polylogarithmic depth for the optimal approach, and O(

) work with practical, strong-depth guarantees for the online and tracing variants, respectively; experiments demonstrate up to 150x speedups over a well-optimized sequential baseline on billion-scale trees. The practical impact is significant: deterministic, scalable dendrogram construction on massive trees enables faster hierarchical clustering and downstream analytics in domains like biology, image analysis, and networks.

Abstract

Computing a Single-Linkage Dendrogram (SLD) is a key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree

, the SLD of

is a binary dendrogram that summarizes the

clusterings obtained by contracting the edges of

in order of weight. Existing algorithms for computing the SLD all require

work where

. Furthermore, to the best of our knowledge no prior work provides a parallel algorithm obtaining non-trivial speedup for this problem. In this paper, we design faster parallel algorithms for computing SLDs both in theory and in practice based on new structural results about SLDs. In particular, we obtain a deterministic output-sensitive parallel algorithm based on parallel tree contraction that requires

work and

depth, where

is the height of the output SLD. We also give a deterministic bottom-up algorithm for the problem inspired by the nearest-neighbor chain algorithm for hierarchical agglomerative clustering, and show that it achieves

work and

depth. Our results are based on a novel divide-and-conquer framework for building SLDs, inspired by divide-and-conquer algorithms for Cartesian trees. Our new algorithms can quickly compute the SLD on billion-scale trees, and obtain up to 150x speedup over the highly-efficient Union-Find algorithm typically used to compute SLDs in practice.

Paper Structure (18 sections, 12 theorems, 2 equations, 8 figures, 1 table, 6 algorithms)

This paper contains 18 sections, 12 theorems, 2 equations, 8 figures, 1 table, 6 algorithms.

Introduction
Preliminaries
Parallel Tree Contraction
Meldable Heaps
Single-Linkage Clustering
Merge-based Algorithms
Merging Dendrograms
Optimal Algorithm via Tree Contraction
A Sub-optimal Tree-Contraction Algorithm.
Optimizing the merge step.
Practical Algorithms
Activation-Based Algorithm ($\mathsf{ParUF}$)
RC-Tree Tracing Algorithm ($\mathsf{RCTT}$)
Experimental Evaluation
Algorithm Performance
...and 3 more sections

Key Result

lemma 1

Let $D$ be the output SLD of the tree $G(V,E)$. For the edge $e=(u,v)\in E$, let $D(e)$ denote the subtree rooted at node $e$, and let $D^u(e)$ denote the subtree rooted at the child of node $e$ that contains the vertex $u$ as a leaf. Similarly, we define $D^v(e)$. Then, $D^u(e) = \mathcal{I}^u(e)$

Figures (8)

Figure 1: Example of single-linkage clustering on the input tree shown in the top panel. The bottom left panel shows a typical visualization of the dendrogram based on the "height" of each edge, and the bottom right panel shows the structure of the output SLD.
Figure 2: An example illustrating SLD-Merge. The tree is split at node $e$ into two trees (the left and right sides of the dashed line) which share no edges, and only share the vertex $e$. The SLD-Merge routine merges the two spines formed by the lowest-rank edge incident to $e$ in both trees.
Figure 3: Adjacent Superiors and Inferiors (see \ref{['defn:adjsupinf']}).
Figure 4: An example illustrating the two-step rake (see \ref{['alg:rake']}). Here, we perform $\rake(e,c)$, which rakes $c$ into $e$.
Figure 5: A full example of SLD-TreeContraction: the first column represents the rakes/compresses performed in that round; the second column displays the clustering obtained by tree contraction (as well as a compact representation); the third column displays the (non-empty) heaps maintained at each cluster; and the fourth column represents the (non-empty) SLDs of each cluster.
...and 3 more figures

Theorems & Definitions (15)

definition 1: Adjacent Superior and Inferior
lemma 1
lemma 2
lemma 3
theorem 1
lemma 4
lemma 5
Claim 2
Claim 3
lemma 6
...and 5 more

Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering

TL;DR

Abstract

Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (15)