Table of Contents
Fetching ...

Finding and evaluating community structure in networks

M. E. J. Newman, M. Girvan

TL;DR

The paper addresses how to uncover natural community structure in networks using divisive algorithms that iteratively remove edges with high betweenness and recalibrate after each removal. By introducing multiple betweenness definitions and a modularity-based criterion, the authors provide a practical framework for determining the number of communities and evaluating division quality, with the shortest-path betweenness version recommended for scalability. Empirical results across synthetic benchmarks and real networks (e.g., karate club, collaboration networks, dolphins, Les Misérables, web graphs) demonstrate robust detection and interpretable coarse-grained representations, while highlighting the method's computational demands. The work lays a foundation for scalable, interpretable network analysis and points to parallelization and methodological refinements as avenues for future improvement.

Abstract

We propose and study a set of algorithms for discovering community structure in networks -- natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.

Finding and evaluating community structure in networks

TL;DR

The paper addresses how to uncover natural community structure in networks using divisive algorithms that iteratively remove edges with high betweenness and recalibrate after each removal. By introducing multiple betweenness definitions and a modularity-based criterion, the authors provide a practical framework for determining the number of communities and evaluating division quality, with the shortest-path betweenness version recommended for scalability. Empirical results across synthetic benchmarks and real networks (e.g., karate club, collaboration networks, dolphins, Les Misérables, web graphs) demonstrate robust detection and interpretable coarse-grained representations, while highlighting the method's computational demands. The work lays a foundation for scalable, interpretable network analysis and points to parallelization and methodological refinements as avenues for future improvement.

Abstract

We propose and study a set of algorithms for discovering community structure in networks -- natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.

Paper Structure

This paper contains 13 sections, 5 equations, 13 figures.

Figures (13)

  • Figure 1: A small network with community structure of the type considered in this paper. In this case there are three communities, denoted by the dashed circles, which have dense internal links but between which there are only a lower density of external links.
  • Figure 2: A hierarchical tree or dendrogram illustrating the type of output generated by the algorithms described here. The circles at the bottom of the figure represent the individual vertices of the network. As we move up the tree the vertices join together to form larger and larger communities, as indicated by the lines, until we reach the top, where all are joined together in a single community. Alternatively, we the dendrogram depicts an initially connected network splitting into smaller and smaller communities as we go from top to bottom. A cross-section of the tree at any level, as indicated by the dotted line, will give the communities at that level. The vertical height of the split-points in the tree are indicative only of the order in which the splits (or joins) took place, although it is possible to construct more elaborate dendrograms in which these heights contain other information.
  • Figure 3: Agglomerative clustering methods are typically good at discovering the strongly linked cores of communities (bold vertices and edges) but tend to leave out peripheral vertices, even when, as here, most of them clearly belong to one community or another.
  • Figure 4: Calculation of shortest-path betweenness: (a) When there is only a single shortest path from a source vertex $s$ (top) to all other reachable vertices, those paths necessarily form a tree, which makes the calculation of the contribution to betweenness from this set of paths particularly simple, as describe in the text. (b) For cases in which there is more than one shortest path to some vertices, the calculation is more complex. First we must calculate the number of paths from the source to each other vertex (numbers on vertices), and then these are used to weight the path counts appropriately. In either case, we can check the results by confirming that the sum of the betweennesses of the edges connected to the source vertex is equal to the total number of reachable vertices---six in each of the cases illustrated here.
  • Figure 5: An example of the type of resistor network considered here, in which a unit resistance is placed on each edge and unit current flows into and out of the source and sink vertices.
  • ...and 8 more figures