Hierarchical Clustering Algorithms on Poisson and Cox Point Processes
Sayeh Khaniha, François Baccelli
TL;DR
The paper develops CHN^2, a hierarchical clustering algorithm that compresses components to clustroid pairs and links them by nearest-neighbor rules, scalable to countably infinite datasets. It analyzes the resulting stochastic structure on point processes, proving finiteness at every level, unimodular local weak limits, and a one-ended limiting forest on the Poisson baseline, with a Last Universal Common Ancestor emerging at infinity. The methodology introduces pre-limit point-shifts $f^n$ that recursively define higher-order clusters, and proves a limiting CHN^2PS EFF that captures the global hierarchy, while extending results to Cox and general stationary processes without second-order descending chains. The work also provides practical tools for aggregation detection via simulations and stopping criteria, and situates CHN^2 within the broader landscape of random graphs and clustering, highlighting open questions about tree structure, point-map probabilities, and extensions to alternative distance measures.
Abstract
This paper introduces a hierarchical clustering algorithm, the Clustroid Hierarchical Nearest Neighbor ($\mathrm{CHN}^2$), designed for datasets with a countably infinite number of points. The method builds clusters across successive levels by linking nearest-neighbor points or clusters using the clustroid distance. The properties of this algorithm make it suitable for very large datasets. To evaluate its properties, we first apply the algorithm to the homogeneous Poisson point process, which serves as a natural null-hypothesis model with no intrinsic aggregation. In this setting, the algorithm generates a random forest that is a factor of the Poisson point process and hence unimodular. We prove that at every level, the level-$k$ graph has only finite connected components (a.s.) and derive bounds on their mean size. We also establish the existence of a limiting graph as the number of levels tends to infinity. In this limit, clusters are infinite and one-ended, which induces a natural order within each component and supports a tree-like phylogenetic interpretation. Beyond the Poisson case, we extend the analysis to a class of Cox and more general stationary point processes without second-order descending chains (introduced here), for which analogous results hold. Simulations show that comparing these cases with the Poisson baseline allows an efficient detection of aggregation, thereby linking the stochastic-geometric analysis to practical clustering tasks.
