Table of Contents
Fetching ...

A novel metric for community detection

Ke-ke Shang, Michael Small, Yan Wang, Di Yin, Shu Li

TL;DR

The paper addresses the problem that Modularity-based metrics may mischaracterize communities by assuming higher internal density; it proposes a predictability-based criterion for community detection. The method defines $S_{pr}=\frac{\sum_{i=1}^n {\frac{S_{in}^i - S_{all}^i}{S_{all}^i}}}{n}$, where $S_{in}^i$ and $S_{all}^i$ are link-prediction accuracies for internal and all links under the $i$th predictor, and it evaluates three link-prediction schemes (CN, LHN1, HDI) across five networks with eight detection algorithms. The results show that internal-link predictability generally exceeds all-link predictability and that $S_{pr}$ provides a more stable, robust ranking of algorithms than Modularity, while also exposing failures (e.g., negative $S_{pr}$) and revealing broader statistical patterns. This work suggests a more flexible and informative view of what constitutes a community and offers a practical tool to compare algorithms across diverse networks.

Abstract

Research into detection of dense communities has recently attracted increasing attention within network science, various metrics for detection of such communities have been proposed. The most popular metric -- Modularity -- is based on the so-called rule that the links within communities are denser than external links among communities, has become the default. However, this default metric suffers from ambiguity, and worse, all augmentations of modularity and based on a narrow intuition of what it means to form a "community". We argue that in specific, but quite common systems, links within a community are not necessarily more common than links between communities. Instead we propose that the defining characteristic of a community is that links are more predictable within a community rather than between communities. In this paper, based on the effect of communities on link prediction, we propose a novel metric for the community detection based directly on this feature. We find that our metric is more robustness than traditional modularity. Consequently, we can achieve an evaluation of algorithm stability for the same detection algorithm in different networks. Our metric also can directly uncover the false community detection, and infer more statistical characteristics for detection algorithms.

A novel metric for community detection

TL;DR

The paper addresses the problem that Modularity-based metrics may mischaracterize communities by assuming higher internal density; it proposes a predictability-based criterion for community detection. The method defines , where and are link-prediction accuracies for internal and all links under the th predictor, and it evaluates three link-prediction schemes (CN, LHN1, HDI) across five networks with eight detection algorithms. The results show that internal-link predictability generally exceeds all-link predictability and that provides a more stable, robust ranking of algorithms than Modularity, while also exposing failures (e.g., negative ) and revealing broader statistical patterns. This work suggests a more flexible and informative view of what constitutes a community and offers a practical tool to compare algorithms across diverse networks.

Abstract

Research into detection of dense communities has recently attracted increasing attention within network science, various metrics for detection of such communities have been proposed. The most popular metric -- Modularity -- is based on the so-called rule that the links within communities are denser than external links among communities, has become the default. However, this default metric suffers from ambiguity, and worse, all augmentations of modularity and based on a narrow intuition of what it means to form a "community". We argue that in specific, but quite common systems, links within a community are not necessarily more common than links between communities. Instead we propose that the defining characteristic of a community is that links are more predictable within a community rather than between communities. In this paper, based on the effect of communities on link prediction, we propose a novel metric for the community detection based directly on this feature. We find that our metric is more robustness than traditional modularity. Consequently, we can achieve an evaluation of algorithm stability for the same detection algorithm in different networks. Our metric also can directly uncover the false community detection, and infer more statistical characteristics for detection algorithms.

Paper Structure

This paper contains 1 section, 4 equations, 4 figures, 2 tables.

Table of Contents

  1. Introduction

Figures (4)

  • Figure 1: Two different kinds of communities. The squares indicate nodes within a single community. Conversely, the circles indicate nodes external to the community. Panel (a) depicts the traditional community which was been well-studied by previous studies. Panel (b) is an another real-world ("corporate") community which may (for example) reflect an internal management structure.
  • Figure 2: For $8$ traditional community detection algorithms, the prediction accuracies of community internal links versus all links via three traditional link prediction algorithms in $5$ traditional networks. The ordinate is the AUC score of internal links, and the abscissa is the AUC score of all links. The dashed line is the diagonal. Obviously, the dot which is above the diagonal means the AUC score of internal links is bigger than that of all links for the corresponding link prediction algorithm.
  • Figure 3: For $8$ traditional community detection algorithms, the average prediction accuracies of community internal links and all links of three traditional link prediction algorithms in $5$ traditional networks. The hollow dots indicate the results of internal links, and the filled dots indicate the results of all links.
  • Figure 4: For $5$ famous networks, the results of metric Predictability and metric Modularity. The ordinate is the score of corresponding metric. The dashed line indicates the effective score. The hollow marks indicate the results of corresponding networks are more stable than those of other networks.