Table of Contents
Fetching ...

Graph Max Shift: A Hill-Climbing Method for Graph Clustering

Ery Arias-Castro, Elizabeth Coda, Wanli Qiao

TL;DR

This work introduces Graph Max Shift, a graph-based hill-climbing clustering algorithm that moves each node to a neighbor of maximal degree and forms clusters from terminal nodes (with optional $\tau$-hop merging). When applied to a random geometric graph with latent positions drawn i.i.d. from a Morse-regular density, the method recovers the basins of attraction of the density gradient flow asymptotically, without explicit embedding. The paper establishes a rigorous consistency theory, connects the algorithm to Max Shift, Max Slope Shift, and Morse clustering, and demonstrates practical behavior through numerical experiments on Gaussian mixtures. The results highlight the viability of graph-based gradient-ascent clustering in settings where only adjacency information is available, offering a principled bridge between discrete graph algorithms and continuous density-based clustering.

Abstract

We present a method for graph clustering that is analogous to gradient ascent methods previously proposed for clustering points in space. The algorithm, which can be viewed as a max-degree hill-climbing procedure on the graph, iteratively moves each node to a neighboring node of highest degree. We show that, when applied to a random geometric graph whose nodes correspond to data drawn i.i.d. from a density with Morse regularity, the method is asymptotically consistent. Here, consistency is in the sense of Fukunaga and Hostetler, meaning, with respect to the partition of the support of the density defined by the basins of attraction of the density gradient flow.

Graph Max Shift: A Hill-Climbing Method for Graph Clustering

TL;DR

This work introduces Graph Max Shift, a graph-based hill-climbing clustering algorithm that moves each node to a neighbor of maximal degree and forms clusters from terminal nodes (with optional -hop merging). When applied to a random geometric graph with latent positions drawn i.i.d. from a Morse-regular density, the method recovers the basins of attraction of the density gradient flow asymptotically, without explicit embedding. The paper establishes a rigorous consistency theory, connects the algorithm to Max Shift, Max Slope Shift, and Morse clustering, and demonstrates practical behavior through numerical experiments on Gaussian mixtures. The results highlight the viability of graph-based gradient-ascent clustering in settings where only adjacency information is available, offering a principled bridge between discrete graph algorithms and continuous density-based clustering.

Abstract

We present a method for graph clustering that is analogous to gradient ascent methods previously proposed for clustering points in space. The algorithm, which can be viewed as a max-degree hill-climbing procedure on the graph, iteratively moves each node to a neighboring node of highest degree. We show that, when applied to a random geometric graph whose nodes correspond to data drawn i.i.d. from a density with Morse regularity, the method is asymptotically consistent. Here, consistency is in the sense of Fukunaga and Hostetler, meaning, with respect to the partition of the support of the density defined by the basins of attraction of the density gradient flow.

Paper Structure

This paper contains 23 sections, 12 theorems, 81 equations, 7 figures.

Key Result

Theorem 3.3

Suppose $\mathcal{Y}_n := \{y_1, \dots, y_n\}$ is generated iid from a density $f$ on $\mathbb{R}^d$ with compact support, twice continuously differentiable, and of Morse regularity, and consider Graph Max Shift applied to the adjacency matrix of $\mathcal{G}(\mathcal{Y}_n, \epsilon_n)$ with $\epsil

Figures (7)

  • Figure 4.1: Graph Max Shift applied to $\mathcal{G}(\mathcal{Y}; \epsilon)$ where the sample, of size $n = 10^4$, is drawn from different normal mixtures. The solid black lines represent borders between the basins of attraction (population clusters) and the triangles represent the locations of the modes of the density. As the means of the mixture components all either coincide with the modes or are very close to the modes, we do not plot them. Data points are colored according to the clustering obtained via Graph Max Shift. Points that belong to clusters with fewer than $25$ points are colored in gray.
  • Figure 4.2: Graph Max Shift applied to $\mathcal{G}(\mathcal{Y};\epsilon)$ with data drawn from a trimodal Gaussian mixture. The top row shows the obtained clustering with the indicated $\epsilon$ and $\tau = 1$. Terminal nodes are highlighted in black. The bottom row depicts the degree of each node, which is proportional to the value of the density estimator implicitly computed. Additionally, each plot includes a ball of radius $\epsilon$ in the bottom right for reference. The errors of each clustering are quantified in Figure \ref{['fig:tuning_quant']}
  • Figure 4.3: The error as a function of $\epsilon$ when Graph Max Shift is applied to $\mathcal{G}(\mathcal{Y};\epsilon)$ with $\tau = 1$ for data drawn from the trimodal Gaussian mixture for different values of $n$. The left plot shows the weak clustering error \ref{['weakly consistent']} and the right plot shows the clustering error \ref{['consistent']}. Solid lines indicate the error averaged over $100$ simulations, and the shaded regions indicate empirical 80% intervals. Clusterings for select values of $\epsilon$ when $n=10^4$ are shown in Figure \ref{['fig:tuning']}.
  • Figure 4.4: Graph Max Shift path $(y^k)$ in green and an approximation of the Max Shift path $(x_k)$ based on $f$ itself in navy blue. Modes of the density are plotted as triangles.
  • Figure A.1: The weak clustering error (left) and the clustering error (right) of the other Gaussian mixtures from Figure \ref{['fig:mixtures']} when Graph Max Shift is applied to $\mathcal{G}(\mathcal{Y}, \epsilon)$. The tuning parameter $\tau = 1$ for all experiments. Solid lines indicate the error averaged over $100$ simulations, and the shaded regions indicate empirical 80% intervals.
  • ...and 2 more figures

Theorems & Definitions (23)

  • Remark 3.1
  • Remark 3.2
  • Theorem 3.3
  • Lemma 6.1
  • proof
  • Lemma 6.2
  • proof
  • Lemma 6.3
  • proof
  • Lemma 6.4
  • ...and 13 more