Table of Contents
Fetching ...

Flow Divergence: Comparing Maps of Flows with Relative Entropy

Christopher Blöcker, Ingo Scholtes

TL;DR

Flow Divergence introduces a KL-inspired dissimilarity for network partitions that accounts for link patterns by tying the map equation to random-walk description length. By treating one partition as the reference true pattern and another as an estimator, the method yields the expected extra bits required to describe a random walk under the estimator, formalized as $D_F(\mathsf{M}_a||\mathsf{M}_b)$. Central to the approach are mapsim, modular coding, and a walking-on-maps construction that derives module-dependent transition rates, enabling robust comparisons across hierarchical depths. The framework demonstrates superior sensitivity to partition structure over traditional measures, reveals the cost of overfitting in incomplete data, and supports embedding and visualization of partition landscapes in real networks. This yields practical impact for evaluating community descriptions, diagnosing overfitting, and exploring the solution space of network partitions with a link-pattern-aware lens.

Abstract

Networks represent how the entities of a system are connected and can be partitioned differently, prompting ways to compare partitions. Common approaches for comparing network partitions include information-theoretic measures based on mutual information and set-theoretic measures such as the Jaccard index. These measures are often based on computing the agreement in terms of overlap between different partitions of the same set. However, they ignore link patterns which are essential for the organisation of networks. We propose flow divergence, an information-theoretic divergence measure for comparing network partitions, inspired by the ideas behind the Kullback-Leibler divergence and the map equation for community detection. Similar to the Kullback-Leibler divergence, flow divergence adopts a coding perspective and compares two network partitions $\mathsf{M}_a$ and $\mathsf{M}_b$ by considering the expected extra number of bits required to describe a random walk on a network using $\mathsf{M}_b$ relative to reference partition $\mathsf{M}_a$. Because flow divergence is based on random walks, it can be used to compare partitions with arbitrary and different depths. We show that flow divergence distinguishes between partitions that traditional measures consider to be equally good when compared to a reference partition. Applied to real networks, we use flow divergence to estimate the cost of overfitting in incomplete networks and to visualise the solution landscape of network partitions.

Flow Divergence: Comparing Maps of Flows with Relative Entropy

TL;DR

Flow Divergence introduces a KL-inspired dissimilarity for network partitions that accounts for link patterns by tying the map equation to random-walk description length. By treating one partition as the reference true pattern and another as an estimator, the method yields the expected extra bits required to describe a random walk under the estimator, formalized as . Central to the approach are mapsim, modular coding, and a walking-on-maps construction that derives module-dependent transition rates, enabling robust comparisons across hierarchical depths. The framework demonstrates superior sensitivity to partition structure over traditional measures, reveals the cost of overfitting in incomplete data, and supports embedding and visualization of partition landscapes in real networks. This yields practical impact for evaluating community descriptions, diagnosing overfitting, and exploring the solution space of network partitions with a link-pattern-aware lens.

Abstract

Networks represent how the entities of a system are connected and can be partitioned differently, prompting ways to compare partitions. Common approaches for comparing network partitions include information-theoretic measures based on mutual information and set-theoretic measures such as the Jaccard index. These measures are often based on computing the agreement in terms of overlap between different partitions of the same set. However, they ignore link patterns which are essential for the organisation of networks. We propose flow divergence, an information-theoretic divergence measure for comparing network partitions, inspired by the ideas behind the Kullback-Leibler divergence and the map equation for community detection. Similar to the Kullback-Leibler divergence, flow divergence adopts a coding perspective and compares two network partitions and by considering the expected extra number of bits required to describe a random walk on a network using relative to reference partition . Because flow divergence is based on random walks, it can be used to compare partitions with arbitrary and different depths. We show that flow divergence distinguishes between partitions that traditional measures consider to be equally good when compared to a reference partition. Applied to real networks, we use flow divergence to estimate the cost of overfitting in incomplete networks and to visualise the solution landscape of network partitions.
Paper Structure (14 sections, 20 equations, 7 figures, 3 tables)

This paper contains 14 sections, 20 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Four different partitions for the same network. Because common measures such as the Jaccard index and mutual information, including its variants, consider merely node labels but ignore link patterns, they consider partitions (b), (c), and (d) as equally good when compared against the reference partition (a).
  • Figure 2: Illustration of encoding random walks and the principles behind the map equation. (a) Nodes are not partitioned into communities. We derive unique codewords from the nodes' visit rates and use them to describe the shown random-walk sequence with the codewords at the bottom. (b) Nodes are partitioned into three communities and receive codewords that are unique within each community. Codewords for entering and exiting communities are shown next to arrows that point into and out of the communities. (c) The map corresponding to the community structure and coding scheme from (b) drawn as a radial tree. Link widths are proportional to module-normalised codeword usage rates. Good maps have small module exit rates.
  • Figure 3: Different partitions for the same network, drawn on the network in the left column and as a tree in the right column. (a) A two-level partition of the network into five modules. (b) A three-level partition of the network into four modules, one of which has two submodules. Labels in the trees show the rate at which a random walker visits nodes and enters or exits modules. For example, a random walker who is in the blue module exits at rate $\frac{1}{12}$. A random walker who is at the tree's root level enters the green module at rate $\frac{3}{24}$ in partition (a) and $\frac{3}{20}$ in partition (b), respectively.
  • Figure 4: Walking on maps. (a) The same map as in \ref{['fig:map-equation-principle']}c, but now annotated with module-normalised node visit rates instead of codewords. The solid, dashed, and dotted arrows show examples of three random walker paths on the map. (b) We derive transition rates between pairs of nodes according to $\operatorname{mapsim}$: Transition rates depend on the source node's module, not on the source node itself (\ref{['eq:mapsim']}). Therefore, the shortest paths on the map start at module nodes and we obtain the rate at which each shortest path is used by multiplying the transition rates along that path. The dotted arrow does not represent a shortest path because it contains a loop which we need to remove. (c) All shortest paths that start in the blue module and their usage rates.
  • Figure 5: Comparing maps. The same partitions as shown in \ref{['fig:shortcomings']}, together with their maps. Flow divergence can distinguish between partitions that popular measures, such as the Jaccard index and mutual information, consider equally good with respect to a reference partition. (a) The reference partition $\mathsf{M}_a$. The partitions $\mathsf{M}_b$ in (b) and $\mathsf{M}_c$ in (c) have the same codelength and are symmetric: each module overlaps in two out of three nodes with the modules in the reference partition. (d) Partition $\mathsf{M}_d$ with disconnected communities but still a two-out-of-three overlap per module with the reference partition.
  • ...and 2 more figures