Table of Contents
Fetching ...

Parallel Motif-Based Community Detection

Tianyi Chen, Charalampos E. Tsourakakis

TL;DR

The paper tackles scalable community detection by evaluating motif-based methods and introducing a parallel MPI framework. It proposes Triangle-Wedges (TW) as a new edge-similarity score and demonstrates that motif-based approaches can achieve favorable quality-efficiency tradeoffs, while addressing threshold selection and biases in prior groundtruth evaluations. Theoretical results show TW can recover communities in SBM settings where Tectonic may fail, and empirical results on real and synthetic graphs validate practical gains in speed and memory efficiency. Overall, the work delivers a practical, scalable toolkit for motif-based clustering and provides guidance for threshold choice to enable robust real-world deployment.

Abstract

Community detection is a central task in graph analytics. Given the substantial growth in graph size, scalability in community detection continues to be an unresolved challenge. Recently, alongside established methods like Louvain and Infomap, motif-based community detection has emerged. Techniques like Tectonic are notable for their advanced ability to identify communities by pruning edges based on motif similarity scores and analyzing the resulting connected components. In this study, we perform a comprehensive evaluation of community detection methods, focusing on both the quality of their output and their scalability. Specifically, we contribute an open-source parallel framework for motif-based community detection based on a shared memory architecture. We conduct a thorough comparative analysis of community detection techniques from various families among state-of-the-art methods, including Tectonic, label propagation, spectral clustering, Louvain, LambdaCC, and Infomap on graphs with up to billions of edges. A key finding of our analysis is that motif-based graph clustering provides a good balance between performance and efficiency. Our work provides several novel insights. Interestingly, we pinpoint biases in prior works in evaluating community detection methods using the top 5K groundtruth communities from SNAP only, as these are frequently near-cliques. Our empirical studies lead to rules of thumb threshold picking strategies that can be critical for real applications. Finally, we show that Tectonic can fail to recover two well-separated clusters. To address this, we suggest a new similarity measure based on counts of triangles and wedges (TW) that prevents the over-segmentation of communities by Tectonic.

Parallel Motif-Based Community Detection

TL;DR

The paper tackles scalable community detection by evaluating motif-based methods and introducing a parallel MPI framework. It proposes Triangle-Wedges (TW) as a new edge-similarity score and demonstrates that motif-based approaches can achieve favorable quality-efficiency tradeoffs, while addressing threshold selection and biases in prior groundtruth evaluations. Theoretical results show TW can recover communities in SBM settings where Tectonic may fail, and empirical results on real and synthetic graphs validate practical gains in speed and memory efficiency. Overall, the work delivers a practical, scalable toolkit for motif-based clustering and provides guidance for threshold choice to enable robust real-world deployment.

Abstract

Community detection is a central task in graph analytics. Given the substantial growth in graph size, scalability in community detection continues to be an unresolved challenge. Recently, alongside established methods like Louvain and Infomap, motif-based community detection has emerged. Techniques like Tectonic are notable for their advanced ability to identify communities by pruning edges based on motif similarity scores and analyzing the resulting connected components. In this study, we perform a comprehensive evaluation of community detection methods, focusing on both the quality of their output and their scalability. Specifically, we contribute an open-source parallel framework for motif-based community detection based on a shared memory architecture. We conduct a thorough comparative analysis of community detection techniques from various families among state-of-the-art methods, including Tectonic, label propagation, spectral clustering, Louvain, LambdaCC, and Infomap on graphs with up to billions of edges. A key finding of our analysis is that motif-based graph clustering provides a good balance between performance and efficiency. Our work provides several novel insights. Interestingly, we pinpoint biases in prior works in evaluating community detection methods using the top 5K groundtruth communities from SNAP only, as these are frequently near-cliques. Our empirical studies lead to rules of thumb threshold picking strategies that can be critical for real applications. Finally, we show that Tectonic can fail to recover two well-separated clusters. To address this, we suggest a new similarity measure based on counts of triangles and wedges (TW) that prevents the over-segmentation of communities by Tectonic.

Paper Structure

This paper contains 15 sections, 2 theorems, 6 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

Tectonic is a reparameterization of Jaccard edge similarity.

Figures (6)

  • Figure 1: Fractions of motifs cut by groundtruth communities in (a) Gavin, (b) Youtube, and (c) DBLP. (d) The average difference between similarity scores of edges inside and across communities, normalized by the standard deviation.
  • Figure 2: Comparison of community detection methods regarding output quality on an imbalanced SBM. Only TW can fully recover the groundtruth communities. Legends are shared across subfigures.
  • Figure 3: Histograms of community densities of (a) Amazon, (b) DBLP, (c) LiveJournal.
  • Figure 4: (a)-(g) Running time on real-world graphs. Dashed lines represent the sequential running time divided by the number of workers, i.e., expected running time without parallel cost. (h) Memory usage on SNAP graphs.
  • Figure 5: Precision-recall tradeoff on graphs with groundtruth communities.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Proposition 1
  • proof
  • Theorem 1
  • proof