Table of Contents
Fetching ...

Compressing regularized dynamics improves link prediction with the map equation in sparse networks

Maja Lindström, Christopher Blöcker, Tommy Löfstedt, Martin Rosvall

TL;DR

This work tackles link prediction in sparse networks and the shortcomings of MapSim when observations are incomplete. The authors introduce a global regularization of the map equation via a Bayesian prior on transition rates, complemented by three local regularization schemes (Common Neighbors, Mixed Markov Time, and Variable Markov Time) to mitigate over-partitioning and rehabilitation of missing links. Across 38 real networks, the globally regularized MapSim consistently improves predictive accuracy and community detection in sparse regimes, while local methods offer network-specific gains; importantly, this approach is hyperparameter-free and significantly faster than embedding-based methods. The results establish MapSim with global regularization as a robust, interpretable, and scalable tool for link prediction in incomplete data settings, with practical advantages for domains ranging from biology to recommender systems.

Abstract

Predicting future interactions or novel links in networks is an indispensable tool across diverse domains, including genetic research, online social networks, and recommendation systems. Among the numerous techniques developed for link prediction, those leveraging the networks' community structure have proven highly effective. For example, the recently proposed MapSim predicts links based on a similarity measure derived from the code structure of the map equation, a community-detection objective function that operates on network flows. However, the standard map equation assumes complete observations and typically identifies many small modules in networks where the nodes connect through only a few links. This aspect can degrade MapSim's performance on sparse networks. To overcome this limitation, we propose to incorporate a global regularization method based on a Bayesian estimate of the transition rates along with three local regularization methods. The regularized versions of the map equation compensate for incomplete observations and mitigate spurious community fragmentation in sparse networks. The regularized methods outperform standard MapSim and several state-of-the-art embedding methods in highly sparse networks. This performance holds across multiple real-world networks with randomly removed links, simulating incomplete observations. Among the proposed regularization methods, the global approach provides the most reliable community detection and the highest link prediction performance across different network densities. The principled method requires no hyperparameter tuning and runs at least an order of magnitude faster than the embedding methods.

Compressing regularized dynamics improves link prediction with the map equation in sparse networks

TL;DR

This work tackles link prediction in sparse networks and the shortcomings of MapSim when observations are incomplete. The authors introduce a global regularization of the map equation via a Bayesian prior on transition rates, complemented by three local regularization schemes (Common Neighbors, Mixed Markov Time, and Variable Markov Time) to mitigate over-partitioning and rehabilitation of missing links. Across 38 real networks, the globally regularized MapSim consistently improves predictive accuracy and community detection in sparse regimes, while local methods offer network-specific gains; importantly, this approach is hyperparameter-free and significantly faster than embedding-based methods. The results establish MapSim with global regularization as a robust, interpretable, and scalable tool for link prediction in incomplete data settings, with practical advantages for domains ranging from biology to recommender systems.

Abstract

Predicting future interactions or novel links in networks is an indispensable tool across diverse domains, including genetic research, online social networks, and recommendation systems. Among the numerous techniques developed for link prediction, those leveraging the networks' community structure have proven highly effective. For example, the recently proposed MapSim predicts links based on a similarity measure derived from the code structure of the map equation, a community-detection objective function that operates on network flows. However, the standard map equation assumes complete observations and typically identifies many small modules in networks where the nodes connect through only a few links. This aspect can degrade MapSim's performance on sparse networks. To overcome this limitation, we propose to incorporate a global regularization method based on a Bayesian estimate of the transition rates along with three local regularization methods. The regularized versions of the map equation compensate for incomplete observations and mitigate spurious community fragmentation in sparse networks. The regularized methods outperform standard MapSim and several state-of-the-art embedding methods in highly sparse networks. This performance holds across multiple real-world networks with randomly removed links, simulating incomplete observations. Among the proposed regularization methods, the global approach provides the most reliable community detection and the highest link prediction performance across different network densities. The principled method requires no hyperparameter tuning and runs at least an order of magnitude faster than the embedding methods.

Paper Structure

This paper contains 20 sections, 11 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: A communication game on a network. The black line illustrates a possible path of a random walk on the network, with colors representing different modules. (a) An example network with 7 nodes and 10 links. Link widths correspond to their weights, shown next to each link. (b) A one-level partition with all nodes grouped into the same module, and seven unique codewords. The sequence of codewords below the network describes the random walk ($23$ bits). (c) A two-level partition where nodes are divided into two modules with reusable codewords. Arrows show codewords for entering and exiting modules. The random walker's path can now be communicated more efficiently ($22$ bits).
  • Figure 2: Global regularization.(a) A weighted network with incomplete observations, where missing links are highlighted in red. (b) The network after applying global regularization, incorporating a Bayesian estimate of the transition rates to adjust the flows.
  • Figure 3: MapSim.(a) A network with two modules and transition rate annotations for moving in the network. (b) A tree-like representation of the network, annotated with transition rates, showing the cost in bits for moving from node 4 to 2, and from node 4 to 6, respectively.
  • Figure 4: Combining MapSim with a Bayesian estimate of the transition rates for reliable link prediction in incomplete networks.(a) The complete network with line widths representing link weights. The map equation identifies two communities. With MapSim, the network is represented as a hierarchical coding tree with transition rates shown between the nodes. The bit costs of transitioning from node 4 to nodes 2 and 6 are shown, with the most probable link between nodes 4 and 6. (b) The observed network where some link weights are observed incorrectly, indicated in red. The map equation now identifies three communities, and with MapSim, the link between nodes 4 and 2 becomes more likely. (c) The regularized map equation correctly detects the original two communities, restoring MapSim's prediction that the link between nodes 4 and 6 is the most likely.
  • Figure 5: Common Neighbors.(a) The unweighted example network with missing observations. This example is based on node 5, highlighted in blue. Node 5 is a common neighbor of nodes 4 and 6, nodes 4 and 7, and nodes 6 and 7. If two nodes have a common neighbor, they are connected by a link, shown with dashed blue lines. (b) The resulting network. For simplicity, link widths represent the average weight of the two directed links between each node pair.
  • ...and 9 more figures