Table of Contents
Fetching ...

CluStRE: Streaming Graph Clustering with Multi-Stage Refinement

Adil Chhabra, Shai Dorian Peretz, Christian Schulz

TL;DR

CluStRE tackles scalable graph clustering in streaming environments by integrating one-pass streaming with a dynamic quotient-graph representation and multi-stage refinement. The method alternates between streaming-based modularity gain scoring and offline-like memetic refinement on a quotient graph, followed by optional re-streaming with local search to inject partial global information. Four configurations provide trade-offs among speed, memory, and clustering quality, and experiments show CluStRE achieves high-quality clustering—approaching in-memory performance—while significantly reducing memory and runtime relative to state-of-the-art streaming methods. This work narrows the gap between streaming and in-memory clustering, enabling high-quality modularity optimization on large-scale graphs under resource constraints.

Abstract

We present CluStRE, a novel streaming graph clustering algorithm that balances computational efficiency with high-quality clustering using multi-stage refinement. Unlike traditional in-memory clustering approaches, CluStRE processes graphs in a streaming setting, significantly reducing memory overhead while leveraging re-streaming and evolutionary heuristics to improve solution quality. Our method dynamically constructs a quotient graph, enabling modularity-based optimization while efficiently handling large-scale graphs. We introduce multiple configurations of CluStRE to provide trade-offs between speed, memory consumption, and clustering quality. Experimental evaluations demonstrate that CluStRE improves solution quality by 89.8%, operates 2.6 times faster, and uses less than two-thirds of the memory required by the state-of-the-art streaming clustering algorithm on average. Moreover, our strongest mode enhances solution quality by up to 150% on average. With this, CluStRE achieves comparable solution quality to in-memory algorithms, i.e. over 96% of the quality of clustering approaches, including Louvain, effectively bridging the gap between streaming and traditional clustering methods.

CluStRE: Streaming Graph Clustering with Multi-Stage Refinement

TL;DR

CluStRE tackles scalable graph clustering in streaming environments by integrating one-pass streaming with a dynamic quotient-graph representation and multi-stage refinement. The method alternates between streaming-based modularity gain scoring and offline-like memetic refinement on a quotient graph, followed by optional re-streaming with local search to inject partial global information. Four configurations provide trade-offs among speed, memory, and clustering quality, and experiments show CluStRE achieves high-quality clustering—approaching in-memory performance—while significantly reducing memory and runtime relative to state-of-the-art streaming methods. This work narrows the gap between streaming and in-memory clustering, enabling high-quality modularity optimization on large-scale graphs under resource constraints.

Abstract

We present CluStRE, a novel streaming graph clustering algorithm that balances computational efficiency with high-quality clustering using multi-stage refinement. Unlike traditional in-memory clustering approaches, CluStRE processes graphs in a streaming setting, significantly reducing memory overhead while leveraging re-streaming and evolutionary heuristics to improve solution quality. Our method dynamically constructs a quotient graph, enabling modularity-based optimization while efficiently handling large-scale graphs. We introduce multiple configurations of CluStRE to provide trade-offs between speed, memory consumption, and clustering quality. Experimental evaluations demonstrate that CluStRE improves solution quality by 89.8%, operates 2.6 times faster, and uses less than two-thirds of the memory required by the state-of-the-art streaming clustering algorithm on average. Moreover, our strongest mode enhances solution quality by up to 150% on average. With this, CluStRE achieves comparable solution quality to in-memory algorithms, i.e. over 96% of the quality of clustering approaches, including Louvain, effectively bridging the gap between streaming and traditional clustering methods.

Paper Structure

This paper contains 11 sections, 3 equations, 2 figures, 3 algorithms.

Figures (2)

  • Figure 1: A clustering (colored nodes) of a network into densely interconnected regions.
  • Figure 2: A quotient graph $G_Q$ constructed from an undirected, unweighted toy graph $G$ (all edges have unitary weight). Here, clusters are represented by unique colors and shapes, thick lines show inter-cluster edges and dashed lines show intra-cluster edges. In the quotient graph, each cluster is contracted into a supernode with weight equal to number of nodes in that cluster in $G$. Edges between the nodes of the quotient graph represent inter-cluster edges in $G$, with weight equal to the sum of the weight of the corresponding inter-cluster edges in $G$. Intra-cluster edges in $G$ are represented by weighted self-loops in the quotient graph, counted twice - once for each directed intra-cluster edge.