Table of Contents
Fetching ...

Ultrametric Cluster Hierarchies: I Want 'em All!

Andrew Draganov, Pascal Weber, Rasmus Skibdahl Melanchton Jørgensen, Anna Beer, Claudia Plant, Ira Assent

TL;DR

This paper proves that, for any reasonable hierarchy, one can optimally solve any center-based clustering objective over it (such as $k$-means), and shows that one can quickly access a plethora of new, equally meaningful hierarchies.

Abstract

Hierarchical clustering is a powerful tool for exploratory data analysis, organizing data into a tree of clusterings from which a partition can be chosen. This paper generalizes these ideas by proving that, for any reasonable hierarchy, one can optimally solve any center-based clustering objective over it (such as $k$-means). Moreover, these solutions can be found exceedingly quickly and are themselves necessarily hierarchical. Thus, given a cluster tree, we show that one can quickly access a plethora of new, equally meaningful hierarchies. Just as in standard hierarchical clustering, one can then choose any desired partition from these new hierarchies. We conclude by verifying the utility of our proposed techniques across datasets, hierarchies, and partitioning schemes.

Ultrametric Cluster Hierarchies: I Want 'em All!

TL;DR

This paper proves that, for any reasonable hierarchy, one can optimally solve any center-based clustering objective over it (such as -means), and shows that one can quickly access a plethora of new, equally meaningful hierarchies.

Abstract

Hierarchical clustering is a powerful tool for exploratory data analysis, organizing data into a tree of clusterings from which a partition can be chosen. This paper generalizes these ideas by proving that, for any reasonable hierarchy, one can optimally solve any center-based clustering objective over it (such as -means). Moreover, these solutions can be found exceedingly quickly and are themselves necessarily hierarchical. Thus, given a cluster tree, we show that one can quickly access a plethora of new, equally meaningful hierarchies. Just as in standard hierarchical clustering, one can then choose any desired partition from these new hierarchies. We conclude by verifying the utility of our proposed techniques across datasets, hierarchies, and partitioning schemes.

Paper Structure

This paper contains 57 sections, 25 theorems, 19 equations, 8 figures, 4 tables, 10 algorithms.

Key Result

Theorem 3.2

Let $(L, d')$ be a finite relaxed ultrametric space. Then there exists LCA-tree $T$ with LCA-distance $d$ and a bijection $f:L \leftrightarrow \text{leaves}(T)$ such that, for all $\ell_i, \ell_j \in L$, $d'(\ell_i, \ell_j) = d \left( f(\ell_i) \lor f(\ell_j) \right)$.

Figures (8)

  • Figure 1: Overview of our proposed SHiP clustering framework in which we (1) fit an ultrametric, (2) choose a center-based hierarchy on the ultrametric, and (3) extract a partition from the hierarchy.
  • Figure 2: Clusterings on the 2-moons dataset with varying densities.
  • Figure 3: Visualizations of the ultrametrics and hierarchy/partition combinations on the Boxes dataset.
  • Figure 4: A visualization of how leaves get assigned to centers in the LCA-tree.
  • Figure 5: Left: an example hierarchy $\mathcal{H} = \{\mathcal{P}_1, \mathcal{P}_2, \ldots \}$; $\mathcal{P}' \not\in \mathcal{H}$ is then an example partition which belongs to $\mathcal{H}$. Right: $\mathcal{P}"$ is not a partition since $\ell_2$ and $\ell_3$ each belong to clusters $C_2$ and $C_5$.
  • ...and 3 more figures

Theorems & Definitions (51)

  • Definition 3.0
  • Definition 3.1
  • Theorem 3.2
  • Corollary 3.2
  • Definition 4.0
  • Theorem 4.1
  • Corollary 5.0
  • Definition 6.1: Mutual reachability dbscan
  • Definition 6.2: dc-dist beer2023connecting
  • Proposition 6.2
  • ...and 41 more