Table of Contents
Fetching ...

Community Detection with the Map Equation and Infomap: Theory and Applications

Jelena Smiljanić, Christopher Blöcker, Anton Holmgren, Daniel Edler, Magnus Neuman, Martin Rosvall

TL;DR

The paper presents the map equation and Infomap as a principled, information-theoretic approach to detect flow-based communities in networks, framing community detection as a compression problem for random walks. It surveys representations from simple graphs to memory, multilayer, temporal, and hypergraph models, and details how the map equation organizes flows via module and index codebooks to minimize description length. It covers algorithmic aspects (two-level and multilevel Infomap), remedies to common challenges (resolution and field-of-view limits, Markov-time scaling, higher-order modeling), and extensions that incorporate node attributes, incomplete data, and higher-order interactions. The work also highlights software tools, visualization platforms, and diverse applications including centrality, similarity, bioregions, and model selection, illustrating practical impact across biology, ecology, and networks science. Overall, it provides a comprehensive, actionable framework for applying flow-based community detection to complex systems with rich representations and robust inference methods.

Abstract

Real-world networks have a complex topology comprising many elements often structured into communities. Revealing these communities helps researchers uncover the organizational and functional structure of the system that the network represents. However, detecting community structures in complex networks requires selecting a community detection method among a multitude of alternatives with different network representations, community interpretations, and underlying mechanisms. This tutorial focuses on a popular community detection method called the map equation and its search algorithm Infomap. The map equation framework for community detection describes communities by analyzing dynamic processes on the network. Thanks to its flexibility, the map equation provides extensions that can incorporate various assumptions about network structure and dynamics. To help decide if the map equation is a suitable community detection method for a given complex system and problem at hand - and which variant to choose - we review the map equation's theoretical framework and guide users in applying the map equation to various research problems.

Community Detection with the Map Equation and Infomap: Theory and Applications

TL;DR

The paper presents the map equation and Infomap as a principled, information-theoretic approach to detect flow-based communities in networks, framing community detection as a compression problem for random walks. It surveys representations from simple graphs to memory, multilayer, temporal, and hypergraph models, and details how the map equation organizes flows via module and index codebooks to minimize description length. It covers algorithmic aspects (two-level and multilevel Infomap), remedies to common challenges (resolution and field-of-view limits, Markov-time scaling, higher-order modeling), and extensions that incorporate node attributes, incomplete data, and higher-order interactions. The work also highlights software tools, visualization platforms, and diverse applications including centrality, similarity, bioregions, and model selection, illustrating practical impact across biology, ecology, and networks science. Overall, it provides a comprehensive, actionable framework for applying flow-based community detection to complex systems with rich representations and robust inference methods.

Abstract

Real-world networks have a complex topology comprising many elements often structured into communities. Revealing these communities helps researchers uncover the organizational and functional structure of the system that the network represents. However, detecting community structures in complex networks requires selecting a community detection method among a multitude of alternatives with different network representations, community interpretations, and underlying mechanisms. This tutorial focuses on a popular community detection method called the map equation and its search algorithm Infomap. The map equation framework for community detection describes communities by analyzing dynamic processes on the network. Thanks to its flexibility, the map equation provides extensions that can incorporate various assumptions about network structure and dynamics. To help decide if the map equation is a suitable community detection method for a given complex system and problem at hand - and which variant to choose - we review the map equation's theoretical framework and guide users in applying the map equation to various research problems.
Paper Structure (57 sections, 44 equations, 21 figures)

This paper contains 57 sections, 44 equations, 21 figures.

Figures (21)

  • Figure 1: Modeling and mapping flow with the map equation framework. Given complex system data, the researcher first selects an appropriate network representation (left column) based on the type of interactions: Pairwise interactions can be represented with weighted and directed networks, where link strength and direc-tion capture interaction frequency and orientation. Multi-mode interactions call for multilayer networks, where nodes are replicated across layers representing different times, contexts, or modes. Multi-step interactions are captured with memory networks, where physical nodes (large circles) are associated with state nodes (smaller circles) that retain information about interaction sequences. Multi-body interactions among more than two nodes are naturally represented by hypergraphs, where hyperedges connect multiple nodes simultaneously. Next, a random walk model approximates real-world flow (middle column). Finally, minimizing the map equation reveals flow modules where a random walker remains for extended periods (right column). Because network flows reflect the systems' function, flow modules reveal the systems' functional components.
  • Figure 2: Compressing random walk descriptions in modular networks. (a) In a fully connected network, the random walker visits all nodes uniformly, producing node-visit sequences shown at the bottom. Because repetitions occur randomly, modular compression offers no benefits. (b) In a modular network, the walker tends to stay within communities, and the corresponding sequences at the bottom reveal repeated patterns that enable modular compression. Recurring subsequences of length greater than two are highlighted with colored backgrounds.
  • Figure 3: Schematic solution landscape of varying model complexity. The number of modules in each network partition defines its model complexity, increasing from left to right. Colors indicate module assignments, and the numbers below solutions approximate the description lengths. The solution with the lowest description length balances model complexity and modular regularities.
  • Figure 4: Schematic triangle network with a multilevel partition. Each small triangle (1, 2, 3) is a sub-module, organized in groups of three triangles in each top-level super-module (I, II, III).
  • Figure 5: Field-of-view limitations. Over-partitioning can occur in networks with (a) constrained structure, or (b) random structure. Node colors show module assignments.
  • ...and 16 more figures