Modularity and community structure in networks
M. E. J. Newman
TL;DR
The paper addresses detecting community structure in networks by quantifying how densely groups are connected relative to random expectation through modularity. It introduces a spectral approach using a new modularity matrix, deriving a formulation $Q = \frac{1}{4m} \mathbf{s}^T \mathbf{B} \mathbf{s}$ with $\mathbf{B}$ defined by $B_{ij} = A_{ij} - \frac{k_i k_j}{2m}$, and uses the leading eigenvector to partition vertices by the signs of its components, noting that a network is indivisible if no positive eigenvalues exist. For more than two communities, it applies iterative bipartitioning on subgraphs via their own modularity matrices, ensuring each split increases total modularity and defining final communities as indivisible subgraphs. A Kernighan–Lin–style refinement further maximizes modularity after spectral splits, improving quality and speed, especially on large networks, and outperforming competing methods like GN and CNM while remaining competitive with DA on large problems. Across diverse real-world networks, the method achieves higher modularity and faster runs with scalable performance ($O(n^2\log n)$), demonstrated by large datasets such as a 27k-vertex network running in ~20 minutes.
Abstract
Many networks of interest in the sciences, including a variety of social and biological networks, are found to divide naturally into communities or modules. The problem of detecting and characterizing this community structure has attracted considerable recent attention. One of the most sensitive detection methods is optimization of the quality function known as "modularity" over the possible divisions of a network, but direct application of this method using, for instance, simulated annealing is computationally costly. Here we show that the modularity can be reformulated in terms of the eigenvectors of a new characteristic matrix for the network, which we call the modularity matrix, and that this reformulation leads to a spectral algorithm for community detection that returns results of better quality than competing methods in noticeably shorter running times. We demonstrate the algorithm with applications to several network data sets.
