Table of Contents
Fetching ...

A Starting Point for Dynamic Community Detection with Leiden Algorithm

Subhajit Sahu

TL;DR

The paper tackles dynamic community detection by extending Naive-dynamic, Delta-screening, and Dynamic Frontier approaches to a fast multicore Leiden algorithm. It presents selective refinement and a subset renumbering scheme to maintain well-connected communities while exploiting prior memberships, yielding up to $1.98\times$ average speedups (and up to $3.72\times$ for small updates) over Static Leiden on large graphs, with strong scalability across 64 cores. The methods preserve modularity nearly as well as the static baseline and provide insights into load balancing and refinement costs, suggesting DF Leiden as a practical option for evolving graphs. Overall, the work demonstrates the feasibility and value of dynamic Leiden for efficiently updating communities in dynamic networks, laying groundwork for future refinement and optimization.

Abstract

Real-world graphs often evolve over time, making community or cluster detection a crucial task. In this technical report, we extend three dynamic approaches - Naive-dynamic (ND), Delta-screening (DS), and Dynamic Frontier (DF) - to our multicore implementation of the Leiden algorithm, known for its high-quality community detection. Our experiments, conducted on a server with a 64-core AMD EPYC-7742 processor, show that ND, DS, and DF Leiden achieve average speedups of 1.37x, 1.47x, and 1.98x on large graphs with random batch updates, compared to the Static Leiden algorithm - while scaling at a rate of 1.6x for every doubling of threads. To our knowledge, this is the first attempt to apply dynamic approaches to the Leiden algorithm. We hope these early results pave the way for further development of dynamic approaches for evolving graphs.

A Starting Point for Dynamic Community Detection with Leiden Algorithm

TL;DR

The paper tackles dynamic community detection by extending Naive-dynamic, Delta-screening, and Dynamic Frontier approaches to a fast multicore Leiden algorithm. It presents selective refinement and a subset renumbering scheme to maintain well-connected communities while exploiting prior memberships, yielding up to average speedups (and up to for small updates) over Static Leiden on large graphs, with strong scalability across 64 cores. The methods preserve modularity nearly as well as the static baseline and provide insights into load balancing and refinement costs, suggesting DF Leiden as a practical option for evolving graphs. Overall, the work demonstrates the feasibility and value of dynamic Leiden for efficiently updating communities in dynamic networks, laying groundwork for future refinement and optimization.

Abstract

Real-world graphs often evolve over time, making community or cluster detection a crucial task. In this technical report, we extend three dynamic approaches - Naive-dynamic (ND), Delta-screening (DS), and Dynamic Frontier (DF) - to our multicore implementation of the Leiden algorithm, known for its high-quality community detection. Our experiments, conducted on a server with a 64-core AMD EPYC-7742 processor, show that ND, DS, and DF Leiden achieve average speedups of 1.37x, 1.47x, and 1.98x on large graphs with random batch updates, compared to the Static Leiden algorithm - while scaling at a rate of 1.6x for every doubling of threads. To our knowledge, this is the first attempt to apply dynamic approaches to the Leiden algorithm. We hope these early results pave the way for further development of dynamic approaches for evolving graphs.
Paper Structure (42 sections, 2 equations, 11 figures, 3 tables, 9 algorithms)

This paper contains 42 sections, 2 equations, 11 figures, 3 tables, 9 algorithms.

Figures (11)

  • Figure 1: Illustration of Delta-screening (DS)com-zarayeneh21 and Dynamic Frontier (DF) approaches sahu2024shared, in the presence of edge deletions and insertions, represented with dotted lines and doubled lines, respectively. Vertices identified as affected (initial) by each approach are highlighted in brown, and entire communities marked as affected are depicted in light brown.
  • Figure 2: Comparison of No continued passes, Full Refine, and Subset Refine methods. Here, circles represent communities (or subcommunities post refinement), dotted circles denote old parent communities during the local-moving phase, dotted lines indicate edge deletions, double lines signify edge insertions, and a brown fill indicates mandated further processing. Pre-existing edges are not shown. With No continued passes, all communities are refined after the local-moving phase; however, it may converge prematurely with small batch updates, resulting in suboptimal community structures. The Full Refine method processes all refined communities after they are aggregated into super-vertices. In contrast, with the Subset Refine method, we selectively refine only a subset of communities based on the batch update, leaving the remaining communities unchanged.
  • Figure 3: Illustration of issues arising during the refinement phase when only a subset of communities is refined. Here, circles represent communities (or subcommunities after refinement), dotted circles indicate old parent communities (from the local-moving phase), and lines show both inter- and intra-community edges. Upon refinement of community $B$, subfigure (a) shows that vertex $i$ is isolated, but disconnected from community $C = i$, while subfigure (b) shows that further refinement forms sub-community $D = i$, which is disconnected from community $C = i$.
  • Figure 4: Demonstration of how decreasing or increasing edge density within a community can cause it to split. Here, circles show refined subcommunities, while dotted circles represent the original parent communities from the local-moving phase. Dotted lines indicate edge deletions, double lines represent edge insertions, and brown-filled areas mark regions needing further processing. Red and green boundaries highlight possible split points due to batch updates.
  • Figure 5: Relative Runtime of Static, Naive-dynamic (ND), Delta-screening (DS), and Dynamic Frontier (DF) Leiden, with varying dynamic schedule chunk size (OpenMP), for aggregation phase of the Leiden algorithm. These tests were conducted on large graphs, with batch updates randomly generated at sizes of $10^{-7}|E|$, $10^{-5}|E|$, and $10^{-3}|E|$. The results suggest that a chunk size of $32$ is optimal (highlighted). In this figure, relative runtimes are normalized to maximum runtime, specifically that of Static Leiden with a chunk size of $1$ for dynamic scheduling during the aggregation phase.
  • ...and 6 more figures