Table of Contents
Fetching ...

Mapping memory-biased dynamics with compact models reveals overlapping communities in large networks

Maja Lindström, Rohit Sahasrabuddhe, Anton Holmgren, Christopher Blöcker, Daniel Edler, Martin Rosvall

TL;DR

Identifying overlapping flow-based communities from networks with limited higher-order data is challenging. The authors model memory-biased random walks by constructing a full second-order network $M_2$ and a compressed counterpart $M_c$ via divisive clustering, using memory parameters $p$ and $q$, and detect communities with the map equation as implemented in Infomap. They show that compact models capturing roughly 50% of the state nodes closely approximate the full dynamics on synthetic benchmarks (AMI around 0.9) while yielding interpretable overlaps in real networks, and they scale to large systems. This approach provides a scalable, information-theoretic framework to reveal overlapping flow-based structures when higher-order data are unavailable, with broad applicability to social, biological, and information networks.

Abstract

Many real-world systems, from social networks to protein-protein interactions and species distributions, exhibit overlapping flow-based communities that reflect their functional organisation. However, reliably identifying such overlapping flow-based communities requires higher-order relational data, which are often unavailable. To address this challenge, we capitalise on the flow model underpinning the representation-learning algorithm node2vec and model higher-order flows through memory-biased random walks on first-order networks. Instead of simulating these walks, we model their higher-order dynamic constraints with compact models and control model complexity with an information-theoretic approach. Using the map equation framework, we identify overlapping modules in the resulting higher-order networks. Our compact-model approach proves robust across synthetic benchmark networks, reveals interpretable overlapping communities in empirical networks, and scales to large networks.

Mapping memory-biased dynamics with compact models reveals overlapping communities in large networks

TL;DR

Identifying overlapping flow-based communities from networks with limited higher-order data is challenging. The authors model memory-biased random walks by constructing a full second-order network and a compressed counterpart via divisive clustering, using memory parameters and , and detect communities with the map equation as implemented in Infomap. They show that compact models capturing roughly 50% of the state nodes closely approximate the full dynamics on synthetic benchmarks (AMI around 0.9) while yielding interpretable overlaps in real networks, and they scale to large systems. This approach provides a scalable, information-theoretic framework to reveal overlapping flow-based structures when higher-order data are unavailable, with broad applicability to social, biological, and information networks.

Abstract

Many real-world systems, from social networks to protein-protein interactions and species distributions, exhibit overlapping flow-based communities that reflect their functional organisation. However, reliably identifying such overlapping flow-based communities requires higher-order relational data, which are often unavailable. To address this challenge, we capitalise on the flow model underpinning the representation-learning algorithm node2vec and model higher-order flows through memory-biased random walks on first-order networks. Instead of simulating these walks, we model their higher-order dynamic constraints with compact models and control model complexity with an information-theoretic approach. Using the map equation framework, we identify overlapping modules in the resulting higher-order networks. Our compact-model approach proves robust across synthetic benchmark networks, reveals interpretable overlapping communities in empirical networks, and scales to large networks.
Paper Structure (21 sections, 8 equations, 11 figures, 3 tables)

This paper contains 21 sections, 8 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: A compact representation of higher-order dynamics reveals overlapping communities.a) A first-order network constrains a memoryless random walk that supports two non-overlapping communities. b) We introduce a second-order model by biasing transitions: arriving along the link $(i, j)$, the random walker at node $j$ is biased to backtrack by a factor $1/p$ and to move to a node not adjacent to $i$ by a factor $1/q$. c) We describe the bias in each physical node with state nodes, where arrows indicate transition probabilities. d) To manage computational complexity, we lump state nodes within each physical node. e) By connecting the lumped state nodes, we construct a compact network representation. f) Mapping the memory-biased random walk on the compact network reveals overlapping communities in the middle node.
  • Figure 2: Compact model information loss.Left: Per-physical-node information loss as a function of the number of state nodes. Trajectories shown for two physical nodes, with $j^*$ denoting the physical node with the largest information loss. Right: The corresponding total information loss summed over all physical nodes. In this schematic example, the fourth compact model $M_c$ represents the model either constrained by the compression threshold or the user-defined state-node budget.
  • Figure 3: The divisive clustering algorithm. Schematic embedding of state nodes in cluster $\alpha$, where each point denotes a state node and distances reflect the Jensen--Shannon divergence between their transition-rate vectors. Left: To cluster the state nodes, we first pick a random state node $i_0$. Then, we select the cluster centres $i_1$ as the state node farthest from $i_0$ and $i_2$ as the state node farthest from $i_1$. Right: We assign the state nodes to the nearest cluster centre.
  • Figure 4: Synthetic network with overlapping community structure.a) The first-order network with three modules. b) With $p=1$, $q=2$, we find four modules and five nodes that are in multiple communities.
  • Figure 5: Accuracy on LFR networks. The AMI between the community structures of the compact models and the full second-order models as a function of the state-node budget. We plot the mean AMI across 20 network instances, with shaded regions showing one standard deviation.
  • ...and 6 more figures