Parametrized Power-Iteration Clustering for Directed Graphs
Gwendal Debaussart-Joniec, Harry Sevi, Matthieu Jonckheere, Argyris Kalogeratos
TL;DR
The paper tackles clustering in directed graphs where edge directionality breaks standard diffusion assumptions. It introduces Parametrized Power-Iteration Clustering (ParPIC), an eigen-decomposition-free method that uses a parametrized reversible random-walk operator ${\mathbf{P}}_{(\\nu)}$ and power iterations to obtain a low-dimensional diffusion embedding, with diffusion time ${t}$ selected by an entropy-based criterion ${\mathcal{H}}(t)$. By designing the vertex measure ${\\nu}$ (notably ${\\nu}_{\\gamma}=\\gamma d_{in}+(1-\\gamma)d_{out}$) and avoiding eigen-decomposition, ParPIC achieves competitive clustering accuracy while offering improved scalability, particularly on weakly connected digraphs and graphs with degree heterogeneity. Experimental results across synthetic and real-world digraphs demonstrate ParPIC’s robustness to directionality, outperforming symmetrization- and teleportation-based methods, and matching or exceeding existing power-iteration approaches. The work provides a practical, principled framework for directed graph clustering with automatic diffusion-scale selection and scalable embedding-based clustering.
Abstract
Vertex-level clustering for directed graphs (digraphs) remains challenging as edge directionality breaks the key assumptions underlying popular spectral methods, which also incur the overhead of eigen-decomposition. This paper proposes Parametrized Power-Iteration Clustering (ParPIC), a random-walk-based clustering method for weakly connected digraphs. This builds over the Power-Iteration Clustering paradigm, which uses the rows of the iterated diffusion operator as a data embedding. ParPIC has three important features: the use of parametrized reversible random walk operators, the automatic tuning of the diffusion time, and the efficient truncation of the final embedding, which produces low-dimensional data representations and reduces complexity. Empirical results on synthetic and real-world graphs demonstrate that ParPIC achieves competitive clustering accuracy with improved scalability relative to spectral and teleportation-based methods.
