Table of Contents
Fetching ...

An Approach for Addressing Internally-Disconnected Communities in Louvain Algorithm

Subhajit Sahu

TL;DR

GSP-Louvain is introduced, a parallel algorithm based on Louvain, which mitigates this issue of internally-disconnected communities and outperforms Leiden, NetworKit Leiden, and cuGraph Leiden in community detection.

Abstract

Community detection is the problem of identifying densely connected clusters within a network. While the Louvain algorithm is commonly used for this task, it can produce internally-disconnected communities. To address this, the Leiden algorithm was introduced. This technical report introduces GSP-Louvain, a parallel algorithm based on Louvain, which mitigates this issue. Running on a system with two 16-core Intel Xeon Gold 6226R processors, GSP-Louvain outperforms Leiden, NetworKit Leiden, and cuGraph Leiden by 391x, 6.9x, and 2.6x respectively, processing 410M edges per second on a 3.8B edge graph. Furthermore, GSP-Louvain improves performance at a rate of 1.5x for every doubling of threads.

An Approach for Addressing Internally-Disconnected Communities in Louvain Algorithm

TL;DR

GSP-Louvain is introduced, a parallel algorithm based on Louvain, which mitigates this issue of internally-disconnected communities and outperforms Leiden, NetworKit Leiden, and cuGraph Leiden in community detection.

Abstract

Community detection is the problem of identifying densely connected clusters within a network. While the Louvain algorithm is commonly used for this task, it can produce internally-disconnected communities. To address this, the Leiden algorithm was introduced. This technical report introduces GSP-Louvain, a parallel algorithm based on Louvain, which mitigates this issue. Running on a system with two 16-core Intel Xeon Gold 6226R processors, GSP-Louvain outperforms Leiden, NetworKit Leiden, and cuGraph Leiden by 391x, 6.9x, and 2.6x respectively, processing 410M edges per second on a 3.8B edge graph. Furthermore, GSP-Louvain improves performance at a rate of 1.5x for every doubling of threads.
Paper Structure (30 sections, 2 equations, 7 figures, 1 table, 6 algorithms)

This paper contains 30 sections, 2 equations, 7 figures, 1 table, 6 algorithms.

Figures (7)

  • Figure 1: An example demonstrating the possibility of internally disconnected communities with the Louvain algorithm. Here, $C1$, $C2$, $C3$, and $C4$ are four communities obtained after running a few iterations of the Louvain algorithm, with vertices $1$ to $7$ being members of community $C1$. Thick lines are used to denote higher edge weights.
  • Figure 2: An example illustrating the BFS technique for splitting internally-disconnected communities. Initially, two communities, $C1$ and $C2$, are shown, with $C1$ being internally disconnected due to vertex $4$ joining $C2$. The BFS technique selects random vertices within each community and labels reachable vertices with the same label, indicated with a new community ID.
  • Figure 3: Mean relative runtime, modularity, and fraction of disconnected communities (log-scale) using Split Last (SL) and Split Pass (SP) approaches for splitting disconnected communities with Parallel Louvain algorithm across all graphs in the dataset. Both SL and SP approaches employ Label Propagation (LP), Label Propagation with Pruning (LPP), or Breadth First Search (BFS) techniques for splitting.
  • Figure 4: Runtime in seconds (log-scale), speedup (log-scale), modularity, and fraction of disconnected communities (log-scale) with Original Leiden, NetworKit Leiden, cuGraph Leiden, and GSP-Louvain for each graph in the dataset.
  • Figure 5: Phase split of GSP-Louvain shown on the left, and pass split shown on the right for each graph in the dataset.
  • ...and 2 more figures