Table of Contents
Fetching ...

An Axiomatic Definition of Hierarchical Clustering

Ery Arias-Castro, Elizabeth Coda

TL;DR

This work tackles the problem of defining a principled population-level hierarchical clustering by proposing an axiomatic framework for piecewise constant densities with connected support, and then extending the construction to continuous densities via uniform approximation. The authors prove the existence of a unique finest axiom cluster tree $\mathcal{C}^*_f$ that satisfies three natural axioms, and they show how this tree relates to Hartigan's cluster tree $\mathcal{H}_f$ under mild regularity, yielding convergence in the merge distortion metric $d_M$. A key contribution is identifying conditions (notably internal connectedness) under which $\mathcal{C}^*_f$ coincides with $\mathcal{H}_f$, while providing counterexamples otherwise; for densities with disconnected support the framework yields a Hartigan forest and suggests post-processing via single-linkage if a coarser grouping is desired. The work also discusses practical implications for algorithms, robustness, and high-dimensional clustering, and sketches extensions to flat clustering and piecewise continuous densities. Overall, the paper provides a rigorous, population-level foundation for hierarchical clustering, clarifying when familiar trees arise and guiding principled algorithmic design in density-based settings.

Abstract

In this paper, we take an axiomatic approach to defining a population hierarchical clustering for piecewise constant densities, and in a similar manner to Lebesgue integration, extend this definition to more general densities. When the density satisfies some mild conditions, e.g., when it has connected support, is continuous, and vanishes only at infinity, or when the connected components of the density satisfy these conditions, our axiomatic definition results in Hartigan's definition of cluster tree.

An Axiomatic Definition of Hierarchical Clustering

TL;DR

This work tackles the problem of defining a principled population-level hierarchical clustering by proposing an axiomatic framework for piecewise constant densities with connected support, and then extending the construction to continuous densities via uniform approximation. The authors prove the existence of a unique finest axiom cluster tree that satisfies three natural axioms, and they show how this tree relates to Hartigan's cluster tree under mild regularity, yielding convergence in the merge distortion metric . A key contribution is identifying conditions (notably internal connectedness) under which coincides with , while providing counterexamples otherwise; for densities with disconnected support the framework yields a Hartigan forest and suggests post-processing via single-linkage if a coarser grouping is desired. The work also discusses practical implications for algorithms, robustness, and high-dimensional clustering, and sketches extensions to flat clustering and piecewise continuous densities. Overall, the paper provides a rigorous, population-level foundation for hierarchical clustering, clarifying when familiar trees arise and guiding principled algorithmic design in density-based settings.

Abstract

In this paper, we take an axiomatic approach to defining a population hierarchical clustering for piecewise constant densities, and in a similar manner to Lebesgue integration, extend this definition to more general densities. When the density satisfies some mild conditions, e.g., when it has connected support, is continuous, and vanishes only at infinity, or when the connected components of the density satisfy these conditions, our axiomatic definition results in Hartigan's definition of cluster tree.
Paper Structure (22 sections, 12 theorems, 30 equations, 9 figures)

This paper contains 22 sections, 12 theorems, 30 equations, 9 figures.

Key Result

Lemma 2.6

For two functions $f$ and $g$,

Figures (9)

  • Figure 2.1: Left: A collection of sets with neighboring regions (green) and non-neighboring regions (red). This collection of sets does not have the internally connected property. Right: A collection of sets with the internally connected property.
  • Figure 3.1: A piecewise constant density in $\mathcal{F}$. On the left, the highlighted region may be a cluster under Axiom 1 and on the right, the highlighted region is not a cluster under Axiom 1 as the interior is not connected.
  • Figure 3.2: Left: A simple example of a piecewise constant density built on two regions. Right: The clustering output of K-means with number of clusters $K=2$. One of the clusters is disconnected.
  • Figure 3.3: On the left, the highlighted region could be a cluster under Axiom 2, but the highlighted region on the right oversegments a region of constant density, and should not be a cluster.
  • Figure 3.4: On the left, the lowest density in highlighted cluster exceeds the largest density in a neighboring set. On the right, the highlighted cluster contains a region with lower density than a neighbor, and thus this should not be a cluster.
  • ...and 4 more figures

Theorems & Definitions (39)

  • Definition 2.1: Hierarchical clustering or cluster tree
  • Definition 2.2: Hartigan cluster tree
  • Definition 2.3: Dendrogram
  • Definition 2.4: Merge height
  • Definition 2.5: Merge distortion metric
  • Lemma 2.6
  • proof
  • Definition 2.7: Neighboring regions
  • Definition 2.8: Internally connected property
  • Remark 3.1
  • ...and 29 more