Compressing regularized dynamics improves link prediction with the map equation in sparse networks
Maja Lindström, Christopher Blöcker, Tommy Löfstedt, Martin Rosvall
TL;DR
This work tackles link prediction in sparse networks and the shortcomings of MapSim when observations are incomplete. The authors introduce a global regularization of the map equation via a Bayesian prior on transition rates, complemented by three local regularization schemes (Common Neighbors, Mixed Markov Time, and Variable Markov Time) to mitigate over-partitioning and rehabilitation of missing links. Across 38 real networks, the globally regularized MapSim consistently improves predictive accuracy and community detection in sparse regimes, while local methods offer network-specific gains; importantly, this approach is hyperparameter-free and significantly faster than embedding-based methods. The results establish MapSim with global regularization as a robust, interpretable, and scalable tool for link prediction in incomplete data settings, with practical advantages for domains ranging from biology to recommender systems.
Abstract
Predicting future interactions or novel links in networks is an indispensable tool across diverse domains, including genetic research, online social networks, and recommendation systems. Among the numerous techniques developed for link prediction, those leveraging the networks' community structure have proven highly effective. For example, the recently proposed MapSim predicts links based on a similarity measure derived from the code structure of the map equation, a community-detection objective function that operates on network flows. However, the standard map equation assumes complete observations and typically identifies many small modules in networks where the nodes connect through only a few links. This aspect can degrade MapSim's performance on sparse networks. To overcome this limitation, we propose to incorporate a global regularization method based on a Bayesian estimate of the transition rates along with three local regularization methods. The regularized versions of the map equation compensate for incomplete observations and mitigate spurious community fragmentation in sparse networks. The regularized methods outperform standard MapSim and several state-of-the-art embedding methods in highly sparse networks. This performance holds across multiple real-world networks with randomly removed links, simulating incomplete observations. Among the proposed regularization methods, the global approach provides the most reliable community detection and the highest link prediction performance across different network densities. The principled method requires no hyperparameter tuning and runs at least an order of magnitude faster than the embedding methods.
