Table of Contents
Fetching ...

An iterative spectral algorithm for digraph clustering

James Martin, Tim Rogers, Luca Zanetti

TL;DR

The paper addresses clustering in directed graphs where preserving edge-direction patterns is crucial. It introduces an iterative spectral algorithm that builds a problem-adapted Hermitian representation $M^{\mathcal{S}}$ conditioned on the current clustering $\mathcal{S}$ and uses a normalized eigenbasis of $L(M^{\mathcal{S}})$ to refine the partition, repeating for $T$ iterations and selecting the best clustering by a directional-flow objective $\delta$. The key contribution is a flexible framework that encodes arbitrary meta-graphs via complex roots of unity and a penalty mechanism, enabling accurate recovery under the Directed Stochastic Block Model and real-world graphs like Hearthstone, Florida Bay, and C. elegans brain networks. The approach outperforms the state-of-the-art CLSZ in synthetic experiments and yields interpretable, directionally-consistent cluster structures in diverse datasets, at the cost of additional computation due to repeated spectral decompositions. This gives a practical, tunable tool for revealing higher-order directional patterns in digraphs, with potential applications in ecology, neuroscience, and game analytics.

Abstract

Graph clustering is a fundamental technique in data analysis with applications in many different fields. While there is a large body of work on clustering undirected graphs, the problem of clustering directed graphs is much less understood. The analysis is more complex in the directed graph case for two reasons: the clustering must preserve directional information in the relationships between clusters, and directed graphs have non-Hermitian adjacency matrices whose properties are less conducive to traditional spectral methods. Here we consider the problem of partitioning the vertex set of a directed graph into $k\ge 2$ clusters so that edges between different clusters tend to follow the same direction. We present an iterative algorithm based on spectral methods applied to new Hermitian representations of directed graphs. Our algorithm performs favourably against the state-of-the-art, both on synthetic and real-world data sets. Additionally, it is able to identify a "meta-graph" of $k$ vertices that represents the higher-order relations between clusters in a directed graph. We showcase this capability on data sets pertaining food webs, biological neural networks, and the online card game Hearthstone.

An iterative spectral algorithm for digraph clustering

TL;DR

The paper addresses clustering in directed graphs where preserving edge-direction patterns is crucial. It introduces an iterative spectral algorithm that builds a problem-adapted Hermitian representation conditioned on the current clustering and uses a normalized eigenbasis of to refine the partition, repeating for iterations and selecting the best clustering by a directional-flow objective . The key contribution is a flexible framework that encodes arbitrary meta-graphs via complex roots of unity and a penalty mechanism, enabling accurate recovery under the Directed Stochastic Block Model and real-world graphs like Hearthstone, Florida Bay, and C. elegans brain networks. The approach outperforms the state-of-the-art CLSZ in synthetic experiments and yields interpretable, directionally-consistent cluster structures in diverse datasets, at the cost of additional computation due to repeated spectral decompositions. This gives a practical, tunable tool for revealing higher-order directional patterns in digraphs, with potential applications in ecology, neuroscience, and game analytics.

Abstract

Graph clustering is a fundamental technique in data analysis with applications in many different fields. While there is a large body of work on clustering undirected graphs, the problem of clustering directed graphs is much less understood. The analysis is more complex in the directed graph case for two reasons: the clustering must preserve directional information in the relationships between clusters, and directed graphs have non-Hermitian adjacency matrices whose properties are less conducive to traditional spectral methods. Here we consider the problem of partitioning the vertex set of a directed graph into clusters so that edges between different clusters tend to follow the same direction. We present an iterative algorithm based on spectral methods applied to new Hermitian representations of directed graphs. Our algorithm performs favourably against the state-of-the-art, both on synthetic and real-world data sets. Additionally, it is able to identify a "meta-graph" of vertices that represents the higher-order relations between clusters in a directed graph. We showcase this capability on data sets pertaining food webs, biological neural networks, and the online card game Hearthstone.

Paper Structure

This paper contains 12 sections, 2 theorems, 18 equations, 8 figures, 1 table, 1 algorithm.

Key Result

lemma 1

Let $M$ be a Hermitian representation of $G=(V,E,w)$. Then, for any $x \in \mathbb{C}^V$, it holds that

Figures (8)

  • Figure 1: Example of a digraph with a cluster-structure that depends on the direction of the edges, rather than the density of the clusters.
  • Figure 2: DSBM with parameters $n = 100$, $k = 5$, $\gamma = 0.4$, $p = 0.5$ and $\eta = 0.6$. Meta-graph used to sample the graph (left). Heat map of the adjacency matrix, reordered with respect to the recovered clustering shown in red (right).
  • Figure 3: DSBM with parameters $n = 100$, $k = 5$, $\gamma = 0.4$, $p = 0.5$ and $\eta = 0.6$. Comparison of the performance of our algorithm (solid lines) vs CLSZ (dashed lines). Our minimum clustering value $\delta$: 0.769. CLSZ clustering value $\delta$: 0.995. Our minimum misclassification error: 0.000. CLSZ misclassification error: 0.530.
  • Figure 4: A comparison between our algorithm (left), with $T=50$, and CLSZ (right) on DSBM with fixed parameters $n=100$, $k=7$, $p=0.8$ and varying $\eta$ and $\gamma$. Colour represents the average error over $10$ independent simulations.
  • Figure 5: Hearthstone data set. Clustering partition recovered using our algorithm ($k=9$, $T=100$). Pie charts correspond to clusters recovered by our algorithm and display the distribution of heroes within. An arrow from cluster $S_i$ to $S_j$ is shown if the edge weight directed from $S_i$ to $S_j$ is more than $80 \%$ of the total edge weight between $S_i$ and $S_j$.
  • ...and 3 more figures

Theorems & Definitions (5)

  • lemma 1
  • proof
  • lemma 2
  • proof
  • definition 1