Table of Contents
Fetching ...

Fast algorithm for detecting community structure in networks

M. E. J. Newman

TL;DR

Many networks exhibit community structure, but existing edge-betweenness methods are computationally intensive. The authors propose a modularity-guided greedy agglomerative algorithm that merges communities to maximize Q, computing ΔQ efficiently and representing progress as a dendrogram. The method runs in O((m+n)n) time and scales to networks with millions of vertices, delivering results orders of magnitude faster than prior approaches. It yields meaningful community divisions in both synthetic benchmarks and large real-world networks (e.g., a 56,276-node arXiv collaboration network), enabling rapid exploration and visualization of complex network structure.

Abstract

It has been found that many networks display community structure -- groups of vertices within which connections are dense but between which they are sparser -- and highly sensitive computer algorithms have in recent years been developed for detecting such structure. These algorithms however are computationally demanding, which limits their application to small networks. Here we describe a new algorithm which gives excellent results when tested on both computer-generated and real-world networks and is much faster, typically thousands of times faster than previous algorithms. We give several example applications, including one to a collaboration network of more than 50000 physicists.

Fast algorithm for detecting community structure in networks

TL;DR

Many networks exhibit community structure, but existing edge-betweenness methods are computationally intensive. The authors propose a modularity-guided greedy agglomerative algorithm that merges communities to maximize Q, computing ΔQ efficiently and representing progress as a dendrogram. The method runs in O((m+n)n) time and scales to networks with millions of vertices, delivering results orders of magnitude faster than prior approaches. It yields meaningful community divisions in both synthetic benchmarks and large real-world networks (e.g., a 56,276-node arXiv collaboration network), enabling rapid exploration and visualization of complex network structure.

Abstract

It has been found that many networks display community structure -- groups of vertices within which connections are dense but between which they are sparser -- and highly sensitive computer algorithms have in recent years been developed for detecting such structure. These algorithms however are computationally demanding, which limits their application to small networks. Here we describe a new algorithm which gives excellent results when tested on both computer-generated and real-world networks and is much faster, typically thousands of times faster than previous algorithms. We give several example applications, including one to a collaboration network of more than 50000 physicists.

Paper Structure

This paper contains 4 sections, 1 equation, 4 figures.

Figures (4)

  • Figure 1: The fraction of vertices correctly identified by our algorithms in the computer-generated graphs described in the text. The two curves show results for the new algorithm (circles) and for the algorithm of Girvan and Newman GN02 (squares). Each point is an average over 100 graphs.
  • Figure 2: Dendrogram of the communities found by our algorithm in the "karate club" network of Zachary Zachary77GN02. The shapes of the vertices represent the two groups into which the club split as the result of an internal dispute.
  • Figure 3: Dendrogram of the communities found in the college football network described in the text. The real-world communities---conferences---are denoted by the different shapes as indicated in the legend.
  • Figure 4: Left panel: Community structure in the collaboration network of physicists. The graph breaks down into four large groups, each composed primarily to physicists of one specialty, as shown. Specialties are determined by the subsection(s) of the e-print archive in which individuals post papers: "C.M." indicates condensed matter; "H.E.P." high-energy physics including theory, phenomenology, and nuclear physics; "astro" indicates astrophysics. Middle panel: one of the condensed matter communities is further broken down by the algorithm, revealing an approximate power-law distribution of community sizes. Right panel: one of these smaller communities is further analyzed to reveal individual research groups (different shades), one of which (in dashed box) is the author's own.