Mapping memory-biased dynamics with compact models reveals overlapping communities in large networks
Maja Lindström, Rohit Sahasrabuddhe, Anton Holmgren, Christopher Blöcker, Daniel Edler, Martin Rosvall
TL;DR
Identifying overlapping flow-based communities from networks with limited higher-order data is challenging. The authors model memory-biased random walks by constructing a full second-order network $M_2$ and a compressed counterpart $M_c$ via divisive clustering, using memory parameters $p$ and $q$, and detect communities with the map equation as implemented in Infomap. They show that compact models capturing roughly 50% of the state nodes closely approximate the full dynamics on synthetic benchmarks (AMI around 0.9) while yielding interpretable overlaps in real networks, and they scale to large systems. This approach provides a scalable, information-theoretic framework to reveal overlapping flow-based structures when higher-order data are unavailable, with broad applicability to social, biological, and information networks.
Abstract
Many real-world systems, from social networks to protein-protein interactions and species distributions, exhibit overlapping flow-based communities that reflect their functional organisation. However, reliably identifying such overlapping flow-based communities requires higher-order relational data, which are often unavailable. To address this challenge, we capitalise on the flow model underpinning the representation-learning algorithm node2vec and model higher-order flows through memory-biased random walks on first-order networks. Instead of simulating these walks, we model their higher-order dynamic constraints with compact models and control model complexity with an information-theoretic approach. Using the map equation framework, we identify overlapping modules in the resulting higher-order networks. Our compact-model approach proves robust across synthetic benchmark networks, reveals interpretable overlapping communities in empirical networks, and scales to large networks.
