An Axiomatic Definition of Hierarchical Clustering
Ery Arias-Castro, Elizabeth Coda
TL;DR
This work tackles the problem of defining a principled population-level hierarchical clustering by proposing an axiomatic framework for piecewise constant densities with connected support, and then extending the construction to continuous densities via uniform approximation. The authors prove the existence of a unique finest axiom cluster tree $\mathcal{C}^*_f$ that satisfies three natural axioms, and they show how this tree relates to Hartigan's cluster tree $\mathcal{H}_f$ under mild regularity, yielding convergence in the merge distortion metric $d_M$. A key contribution is identifying conditions (notably internal connectedness) under which $\mathcal{C}^*_f$ coincides with $\mathcal{H}_f$, while providing counterexamples otherwise; for densities with disconnected support the framework yields a Hartigan forest and suggests post-processing via single-linkage if a coarser grouping is desired. The work also discusses practical implications for algorithms, robustness, and high-dimensional clustering, and sketches extensions to flat clustering and piecewise continuous densities. Overall, the paper provides a rigorous, population-level foundation for hierarchical clustering, clarifying when familiar trees arise and guiding principled algorithmic design in density-based settings.
Abstract
In this paper, we take an axiomatic approach to defining a population hierarchical clustering for piecewise constant densities, and in a similar manner to Lebesgue integration, extend this definition to more general densities. When the density satisfies some mild conditions, e.g., when it has connected support, is continuous, and vanishes only at infinity, or when the connected components of the density satisfy these conditions, our axiomatic definition results in Hartigan's definition of cluster tree.
