Table of Contents
Fetching ...

PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU

Piyush Sao, Andrey Prokopenko, Damien Lebrun-Grandié

TL;DR

Pandora removes HDBSCAN*’s sequential bottleneck, greatly boosting efficiency, particularly with GPUs, and is well-suited for GPUs and multicores.

Abstract

This paper presents \pandora, a novel parallel algorithm for efficiently constructing dendrograms for single-linkage hierarchical clustering, including \hdbscan. Traditional dendrogram construction methods from a minimum spanning tree (MST), such as agglomerative or divisive techniques, often fail to efficiently parallelize, especially with skewed dendrograms common in real-world data. \pandora addresses these challenges through a unique recursive tree contraction method, which simplifies the tree for initial dendrogram construction and then progressively reconstructs the complete dendrogram. This process makes \pandora asymptotically work-optimal, independent of dendrogram skewness. All steps in \pandora are fully parallel and suitable for massively threaded accelerators such as GPUs. Our implementation is written in Kokkos, providing support for both CPUs and multi-vendor GPUs (e.g., Nvidia, AMD). The multithreaded version of \pandora is 2.2$\times$ faster than the current best-multithreaded implementation, while the GPU \pandora implementation achieved 6-20$\times$ on \amdgpu and 10-37$\times$ on \nvidiagpu speed-up over multithreaded \pandora. These advancements lead to up to a 6-fold speedup for \hdbscan on GPUs over the current best, which only offload MST construction to GPUs and perform multithreaded dendrogram construction.

PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU

TL;DR

Pandora removes HDBSCAN*’s sequential bottleneck, greatly boosting efficiency, particularly with GPUs, and is well-suited for GPUs and multicores.

Abstract

This paper presents \pandora, a novel parallel algorithm for efficiently constructing dendrograms for single-linkage hierarchical clustering, including \hdbscan. Traditional dendrogram construction methods from a minimum spanning tree (MST), such as agglomerative or divisive techniques, often fail to efficiently parallelize, especially with skewed dendrograms common in real-world data. \pandora addresses these challenges through a unique recursive tree contraction method, which simplifies the tree for initial dendrogram construction and then progressively reconstructs the complete dendrogram. This process makes \pandora asymptotically work-optimal, independent of dendrogram skewness. All steps in \pandora are fully parallel and suitable for massively threaded accelerators such as GPUs. Our implementation is written in Kokkos, providing support for both CPUs and multi-vendor GPUs (e.g., Nvidia, AMD). The multithreaded version of \pandora is 2.2 faster than the current best-multithreaded implementation, while the GPU \pandora implementation achieved 6-20 on \amdgpu and 10-37 on \nvidiagpu speed-up over multithreaded \pandora. These advancements lead to up to a 6-fold speedup for \hdbscan on GPUs over the current best, which only offload MST construction to GPUs and perform multithreaded dendrogram construction.
Paper Structure (45 sections, 5 theorems, 6 equations, 17 figures, 2 tables, 3 algorithms)

This paper contains 45 sections, 5 theorems, 6 equations, 17 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

Let $e_i$ and $e_j$ be any two edges in a tree $T$, then $\textsc{Lcda}(e_i, e_j)$ is the heaviest edge in the path $\text{Path}(e_i, e_j)$. In other words, if edges are sorted by weights in descending order in $T$, then $\textsc{Lcda}(e_i, e_j)$ has the smallest numerical index among all edges in path $\text{Path}(e_i, e_j)$. Formally,

Figures (17)

  • Figure 1: Time taken by Hdbscan* components construction (Euclidean minimum spanning tree (MST) and dendrogram) on AMD EPYC 7A53 CPU and AMD MI250X GPU for Hacc37M dataset.
  • Figure 2: A high-level visualization of the Pandora algorithm. The original MST (top left) is contracted (bottom left). Using the dendrogram corresponding to the contraction (bottom right), the original dendrogram is recovered by edge reinsertion from the same level. The dendrograms are shown using the edges numbering of the original MST; dendrogram leaf nodes, corresponding to the data points, are omitted.
  • Figure 3: An example of a highly skewed dendrogram constructed from a 40 point sample taken from a 3D Gaussian distribution using Hdbscan* mutual reachability distance with $minPts = 2$.
  • Figure 4: Steps in top-down dendrogram construction(\ref{['sec:divisive']}). The dendrogram starts with the heaviest edge in MST as the root, which is removed to divide the tree into two connected components. In subsequent steps, the heaviest edge among all components is identified, and their parent is the heaviest edge from the previous step. This process is repeated recursively for each component of the tree.
  • Figure 5: The Pandora leverages dendrogram chains to construct them efficiently. This dendrogram can be divided into three chains: top, bottom-left, and bottom-right.
  • ...and 12 more figures

Theorems & Definitions (13)

  • Definition 1: Path
  • Definition 2: Ancestors of an Edge
  • Definition 3: Lowest Common Dendrogram Ancestor
  • Theorem 1
  • proof
  • Corollary 1.1
  • proof
  • Theorem 2
  • proof
  • Definition 4: Dendrogram Lineage Preserving Tree Contraction
  • ...and 3 more