Table of Contents
Fetching ...

Hierarchical Clustering Algorithms on Poisson and Cox Point Processes

Sayeh Khaniha, François Baccelli

TL;DR

The paper develops CHN^2, a hierarchical clustering algorithm that compresses components to clustroid pairs and links them by nearest-neighbor rules, scalable to countably infinite datasets. It analyzes the resulting stochastic structure on point processes, proving finiteness at every level, unimodular local weak limits, and a one-ended limiting forest on the Poisson baseline, with a Last Universal Common Ancestor emerging at infinity. The methodology introduces pre-limit point-shifts $f^n$ that recursively define higher-order clusters, and proves a limiting CHN^2PS EFF that captures the global hierarchy, while extending results to Cox and general stationary processes without second-order descending chains. The work also provides practical tools for aggregation detection via simulations and stopping criteria, and situates CHN^2 within the broader landscape of random graphs and clustering, highlighting open questions about tree structure, point-map probabilities, and extensions to alternative distance measures.

Abstract

This paper introduces a hierarchical clustering algorithm, the Clustroid Hierarchical Nearest Neighbor ($\mathrm{CHN}^2$), designed for datasets with a countably infinite number of points. The method builds clusters across successive levels by linking nearest-neighbor points or clusters using the clustroid distance. The properties of this algorithm make it suitable for very large datasets. To evaluate its properties, we first apply the algorithm to the homogeneous Poisson point process, which serves as a natural null-hypothesis model with no intrinsic aggregation. In this setting, the algorithm generates a random forest that is a factor of the Poisson point process and hence unimodular. We prove that at every level, the level-$k$ graph has only finite connected components (a.s.) and derive bounds on their mean size. We also establish the existence of a limiting graph as the number of levels tends to infinity. In this limit, clusters are infinite and one-ended, which induces a natural order within each component and supports a tree-like phylogenetic interpretation. Beyond the Poisson case, we extend the analysis to a class of Cox and more general stationary point processes without second-order descending chains (introduced here), for which analogous results hold. Simulations show that comparing these cases with the Poisson baseline allows an efficient detection of aggregation, thereby linking the stochastic-geometric analysis to practical clustering tasks.

Hierarchical Clustering Algorithms on Poisson and Cox Point Processes

TL;DR

The paper develops CHN^2, a hierarchical clustering algorithm that compresses components to clustroid pairs and links them by nearest-neighbor rules, scalable to countably infinite datasets. It analyzes the resulting stochastic structure on point processes, proving finiteness at every level, unimodular local weak limits, and a one-ended limiting forest on the Poisson baseline, with a Last Universal Common Ancestor emerging at infinity. The methodology introduces pre-limit point-shifts that recursively define higher-order clusters, and proves a limiting CHN^2PS EFF that captures the global hierarchy, while extending results to Cox and general stationary processes without second-order descending chains. The work also provides practical tools for aggregation detection via simulations and stopping criteria, and situates CHN^2 within the broader landscape of random graphs and clustering, highlighting open questions about tree structure, point-map probabilities, and extensions to alternative distance measures.

Abstract

This paper introduces a hierarchical clustering algorithm, the Clustroid Hierarchical Nearest Neighbor (), designed for datasets with a countably infinite number of points. The method builds clusters across successive levels by linking nearest-neighbor points or clusters using the clustroid distance. The properties of this algorithm make it suitable for very large datasets. To evaluate its properties, we first apply the algorithm to the homogeneous Poisson point process, which serves as a natural null-hypothesis model with no intrinsic aggregation. In this setting, the algorithm generates a random forest that is a factor of the Poisson point process and hence unimodular. We prove that at every level, the level- graph has only finite connected components (a.s.) and derive bounds on their mean size. We also establish the existence of a limiting graph as the number of levels tends to infinity. In this limit, clusters are infinite and one-ended, which induces a natural order within each component and supports a tree-like phylogenetic interpretation. Beyond the Poisson case, we extend the analysis to a class of Cox and more general stationary point processes without second-order descending chains (introduced here), for which analogous results hold. Simulations show that comparing these cases with the Poisson baseline allows an efficient detection of aggregation, thereby linking the stochastic-geometric analysis to practical clustering tasks.

Paper Structure

This paper contains 19 sections, 13 theorems, 17 equations, 10 figures.

Key Result

Proposition 2.1

The graph of the point-shift $f^0$ (the $f^0$-graph) is unimodular, and all its connected components belong to the $\mathcal{F/F}$ class in the sense of Baccelli2017; that is, each component is almost surely finite and contains exactly one cycle. Equivalently, both the component and its foils (equiv

Figures (10)

  • Figure 1: Clusters of order zero generated by the $\mathrm{CHN}^2$ clustering algorithm on a finite dataset. The blue arrows connect each point to its nearest neighbor. The red cycles represent cycles of order zero, connecting points that are mutual nearest neighbors. The details of the elements within one cluster are shown in the figure.
  • Figure 2: Left picture: The $f^0$-graph generated on PPP in blue with the $0$-cycles shown in bold; Right picture: Cluster subtrees of order 0 obtained by deleting the edges of the $0$-cycles (clusters consisting of a single point are not visible here). In each cluster subtree, one vertex is a cluster head of order zero, and the other vertices have directed paths towards the cluster head of order zero within the cluster.
  • Figure 3: Left picture: The $f^1$-graph generated on the PPP with the $1$-cycles shown in green bold. The difference between this graph and the $f^0$-graph is the addition of new edges from a cycle of order zero to the closest point of the nearest cycle of order zero, shown in green. Right picture: Cluster subtrees of order 1 obtained by deleting the edges of the $1$-cycles (clusters consisting of a single point are not visible here). In each cluster subtree, one vertex is a cluster head of order 1, and the other vertices have directed paths towards the cluster head of order 1 within the cluster.
  • Figure 4: Estimation of the (logarithm of the) intensity of the exit points at different levels of the algorithm. The initial point process contains approximately 12,000 points. As shown in the plot, the intensity decays by a factor close to $1/3$ at each level, suggesting that the intensity of the exit points at level $k$ is approximately $(1/3)^k$.
  • Figure 5: Left picture: The $f^2$-graph generated on the PPP with the $2$-cycles shown in red bold. The difference between this graph and the $f^1$-graph is the addition of new edges from a cycle of order 1 to the closest point of the nearest cycle of order 1, shown in green. Right picture: Cluster subtrees of order 2 obtained by deleting the edges of the $1$-cycles (clusters consisting of a single point are not visible here). In each cluster subtree, one vertex is a cluster head of order 2, and the other vertices have directed paths towards the cluster head of order 2 within the cluster.
  • ...and 5 more figures

Theorems & Definitions (39)

  • Proposition 2.1
  • proof
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Remark 1
  • Definition 2.6
  • Theorem 2.7: Construction of the point-shift graph
  • proof
  • ...and 29 more