Table of Contents
Fetching ...

Heuristic-based Dynamic Leiden Algorithm for Efficient Tracking of Communities on Evolving Graphs

Subhajit Sahu

TL;DR

This technical report introduces the first implementations of parallel Naive-dynamic (ND), Delta-screening (DS), and Dynamic Frontier (DF) Leiden algorithms that efficiently track communities over time.

Abstract

Community detection, or clustering, identifies groups of nodes in a graph that are more densely connected to each other than to the rest of the network. Given the size and dynamic nature of real-world graphs, efficient community detection is crucial for tracking evolving communities, enhancing our understanding and management of complex systems. The Leiden algorithm, which improves upon the Louvain algorithm, efficiently detects communities in large networks, producing high-quality structures. However, existing multicore dynamic community detection algorithms based on Leiden are inefficient and lack support for tracking evolving communities. This technical report introduces the first implementations of parallel Naive-dynamic (ND), Delta-screening (DS), and Dynamic Frontier (DF) Leiden algorithms that efficiently track communities over time. Experiments on a 64-core AMD EPYC-7742 processor demonstrate that ND, DS, and DF Leiden achieve average speedups of 3.9x, 4.4x, and 6.1x, respectively, on large graphs with random batch updates compared to the Static Leiden algorithm, and these approaches scale at 1.4 - 1.5x for every thread doubling.

Heuristic-based Dynamic Leiden Algorithm for Efficient Tracking of Communities on Evolving Graphs

TL;DR

This technical report introduces the first implementations of parallel Naive-dynamic (ND), Delta-screening (DS), and Dynamic Frontier (DF) Leiden algorithms that efficiently track communities over time.

Abstract

Community detection, or clustering, identifies groups of nodes in a graph that are more densely connected to each other than to the rest of the network. Given the size and dynamic nature of real-world graphs, efficient community detection is crucial for tracking evolving communities, enhancing our understanding and management of complex systems. The Leiden algorithm, which improves upon the Louvain algorithm, efficiently detects communities in large networks, producing high-quality structures. However, existing multicore dynamic community detection algorithms based on Leiden are inefficient and lack support for tracking evolving communities. This technical report introduces the first implementations of parallel Naive-dynamic (ND), Delta-screening (DS), and Dynamic Frontier (DF) Leiden algorithms that efficiently track communities over time. Experiments on a 64-core AMD EPYC-7742 processor demonstrate that ND, DS, and DF Leiden achieve average speedups of 3.9x, 4.4x, and 6.1x, respectively, on large graphs with random batch updates compared to the Static Leiden algorithm, and these approaches scale at 1.4 - 1.5x for every thread doubling.

Paper Structure

This paper contains 42 sections, 2 equations, 13 figures, 2 tables, 12 algorithms.

Figures (13)

  • Figure 1: Illustration of Delta-screening (DS)com-zarayeneh21 and Dynamic Frontier (DF) approaches sahu2024shared, in the presence of edge deletions (dotted lines) and insertions (doubled lines). Vertices identified as affected (initially) by each approach are highlighted in yellow, while affected entire communities are shaded in light yellow.
  • Figure 2: Our procedure for tracking of communities. In the figure, old communities are labeled with lowercase characters and have dotted border (e.g., $a$ and $b$), while current communities are labeled with uppercase characters and have a solid border (e.g. $B$, $C$, and $D$). An overlap of vertices between old and current communities is indicated with an intersection, as in a venn diagram, with the total edge weight of the overlaps being indicated with $w1$, $w2$, $w3$, or $w4$.
  • Figure 3: Percentage match in the community membership of vertices, after a random batch update of size $10^{-7}|E|$ to $0.1|E|$, consisting purely of edge deletions, followed by a batch update which reverses the edge deletions (by inserting the edges back). The dynamic algorithms compared here include the original Naive-dynamic (ND), Delta-screening (DS), and Dynamic Frontier (DF) Leiden with no tracking sahu2024starting, and our improved ND, DS, and DF Leiden with tracking. The match $\%$ of our improved DF Leiden is labeled here.
  • Figure 4: Relative Runtime of Naive-dynamic (ND), Delta-screening (DS), and Dynamic Frontier (DF) Leiden, evaluated, both with and without community tracking (referred to as "no-tracking"). This experiment was conducted on large graphs with random batch updates of size $10^{-7}|E|$ to $0.1|E|$.
  • Figure 5: Heuristic for minimizing communities to refine. Communities $A$ and $B$ contain subcommunities (labeled $A1$ to $A7$ and $B1$ to $B6$). The top subfigure illustrates that edge deletions (dotted lines) in community $A$ can lead to its split, necessitating refinement. The bottom subfigure shows that edge insertions (solid lines) in $B$ can similarly cause a split due to stronger regional connections, also requiring refinement. The figure demonstrates that our algorithm effectively processes two cumulative batch updates, $\Delta1$ and $\Delta2$, as if they were a single large update $\Delta1 + \Delta2$.
  • ...and 8 more figures