Table of Contents
Fetching ...

GSL-LPA: Fast Label Propagation Algorithm (LPA) for Community Detection with no Internally-Disconnected Communities

Subhajit Sahu

TL;DR

The experiments show that GSL-LPA not only mitigates this issue but also surpasses FLPA, igraph LPA, and NetworKit LPA by 55x, 10, 300x, and 5.8x, respectively, achieving a processing rate of 844M edges/s on a 3.8B edge graph.

Abstract

Community detection is the problem of identifying tightly connected clusters of nodes within a network. Efficient parallel algorithms for this play a crucial role in various applications, especially as datasets expand to significant sizes. The Label Propagation Algorithm (LPA) is commonly employed for this purpose due to its ease of parallelization, rapid execution, and scalability - however, it may yield internally disconnected communities. This technical report introduces GSL-LPA, derived from our parallelization of LPA, namely GVE-LPA. Our experiments on a system with two 16-core Intel Xeon Gold 6226R processors show that GSL-LPA not only mitigates this issue but also surpasses FLPA, igraph LPA, and NetworKit LPA by 55x, 10, 300x, and 5.8x, respectively, achieving a processing rate of 844M edges/s on a 3.8B edge graph. Additionally, GSL-LPA scales at a rate of 1.6x for every doubling of threads.

GSL-LPA: Fast Label Propagation Algorithm (LPA) for Community Detection with no Internally-Disconnected Communities

TL;DR

The experiments show that GSL-LPA not only mitigates this issue but also surpasses FLPA, igraph LPA, and NetworKit LPA by 55x, 10, 300x, and 5.8x, respectively, achieving a processing rate of 844M edges/s on a 3.8B edge graph.

Abstract

Community detection is the problem of identifying tightly connected clusters of nodes within a network. Efficient parallel algorithms for this play a crucial role in various applications, especially as datasets expand to significant sizes. The Label Propagation Algorithm (LPA) is commonly employed for this purpose due to its ease of parallelization, rapid execution, and scalability - however, it may yield internally disconnected communities. This technical report introduces GSL-LPA, derived from our parallelization of LPA, namely GVE-LPA. Our experiments on a system with two 16-core Intel Xeon Gold 6226R processors show that GSL-LPA not only mitigates this issue but also surpasses FLPA, igraph LPA, and NetworKit LPA by 55x, 10, 300x, and 5.8x, respectively, achieving a processing rate of 844M edges/s on a 3.8B edge graph. Additionally, GSL-LPA scales at a rate of 1.6x for every doubling of threads.
Paper Structure (25 sections, 2 equations, 7 figures, 1 table, 4 algorithms)

This paper contains 25 sections, 2 equations, 7 figures, 1 table, 4 algorithms.

Figures (7)

  • Figure 1: An example illustrating the potential of internally-disconnected communities with LPA sahu2024addressing. Here, $C1$, $C2$, $C3$, and $C4$ represent four communities derived after one iteration of LPA, with vertices $1$ to $7$ belonging to community $C1$. Thicker lines are used here to indicate higher edge weights.
  • Figure 2: An example demonstrating the BFS technique for splitting disconnected communities sahu2024addressing. Initially, two communities, $C1$ and $C2$, are depicted, where $C1$ is internally disconnected because vertex $4$ has joined $C2$. The BFS technique randomly selects vertices within each community and assigns the same label to reachable vertices (indicated with a new community ID).
  • Figure 3: Mean relative runtime, modularity, and fraction of disconnected communities (log-scale) using the Split Last (SL) approach for addressing disconnected communities with Parallel LPA sahu2023gvelpa across all graphs in the dataset. The SL approach utilizes minimum label-based Label Propagation (LP), Label Propagation with Pruning (LPP), or Breadth First Search (BFS) techniques for splitting disconnected communities.
  • Figure 4: Runtime in seconds (log-scale), speedup (log-scale), modularity, and fraction of disconnected communities (log-scale) compared across FLPA, igraph LPA, NetworKit LPA, and GSL-LPA for each graph in the dataset. igraph LPA fails to execute on kmer_A2a and kmer_V1r graphs, and thus its results are excluded.
  • Figure 5: Phase split of GSL-LPA for each graph in the dataset.
  • ...and 2 more figures