Table of Contents
Fetching ...

Community detection on directed networks with missing edges

Nicola Pedreschi, Renaud Lambiotte, Alexandre Bovet

TL;DR

The recently developed Flow Stability framework is extended, originally designed for detecting communities in time-varying networks, to address the problem of community detection in weighted, directed networks with missing links, and leverages known uncertainty levels in nodes’ out-degrees to enhance the robustness of community detection.

Abstract

Identifying significant community structures in networks with incomplete data is a challenging task, as the reliability of solutions diminishes with increasing levels of missing information. However, in many empirical contexts, some information about the uncertainty in the network measurements can be estimated. In this work, we extend the recently developed Flow Stability framework, originally designed for detecting communities in time-varying networks, to address the problem of community detection in weighted, directed networks with missing links. Our approach leverages known uncertainty levels in nodes' out-degrees to enhance the robustness of community detection. Through comparisons on synthetic networks and a real-world network of messaging channels on the Telegram platform, we demonstrate that our method delivers more reliable community structures, even when a significant portion of data is missing.

Community detection on directed networks with missing edges

TL;DR

The recently developed Flow Stability framework is extended, originally designed for detecting communities in time-varying networks, to address the problem of community detection in weighted, directed networks with missing links, and leverages known uncertainty levels in nodes’ out-degrees to enhance the robustness of community detection.

Abstract

Identifying significant community structures in networks with incomplete data is a challenging task, as the reliability of solutions diminishes with increasing levels of missing information. However, in many empirical contexts, some information about the uncertainty in the network measurements can be estimated. In this work, we extend the recently developed Flow Stability framework, originally designed for detecting communities in time-varying networks, to address the problem of community detection in weighted, directed networks with missing links. Our approach leverages known uncertainty levels in nodes' out-degrees to enhance the robustness of community detection. Through comparisons on synthetic networks and a real-world network of messaging channels on the Telegram platform, we demonstrate that our method delivers more reliable community structures, even when a significant portion of data is missing.

Paper Structure

This paper contains 13 sections, 16 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: $\Delta$ Flow Stability:a) Illustration of the original, or complete, weighted and directed graph (on the left) characterized by two communities, and the experimental measurement (the incomplete graph, the object of our analysis, on the right) of the graph where some of the edges have not been observed and thus the weights of some connections are changed (in red) w.r.t. the original graph. The missing edges affect the community detection, as the graph now appears to have three communities instead of the original two; note how $2$ edges leaving node $i$ are missing in the measured graph, leading to an overall error $\epsilon_i=2$ on node $i$'s measured out-strength $s^i_\text{out}$. b) With probability $1-\alpha_i$, a random walker on node $i$, at any given time step $t$, follows the outgoing edges of $i$ (regular random walk). c) With probability $\alpha_i=\frac{\epsilon_i}{\epsilon_i+s^i_\text{out}}$, instead, the random walker is teleported to any other node in the network.
  • Figure 2: Parameters space of the SBM ensemble:a) example plot of the adjacency matrix of a network generated with the SBM where we highlight the two groups of source nodes (orange and yellow), whose within-connection probability is $p_{in}$ and that connect to their relative "main" core with probability $p_\text{out}$, and "secondary" core with probability $p_\text{out}/4$; the two cores of the network, internally interconnected with probability $p_\text{core}=0.4$, connected to each other with probability $p_\text{in}$, and each connected to their "main" group of sink nodes (dark and light green) with probability $p_\text{out}$, and with probability $p_\text{out}/4$ to their secondary group of sinks. b): examples of the structure of the networks generated with different values of $p_\text{in}$ and $p_\text{out}$.
  • Figure 3: Recovering community structure on a synthetic network:a) (top) heatmap of the Normalized Mutual Information computed between the Flow Stability partition at specific values of Markov time $t$ ($x$-axis) and fraction of randomly removed edges $r$ ($y$-axis) and the original partition in the SBM; (middle) heatmap of the Normalized Mutual Information computed between the $\Delta$ Flow Stability partition at specific values of Markov time $t$ ($x$-axis) and fraction of randomly removed edges $r$ ($y$-axis) and the original partition in the SBM; (bottom) heatmap of the difference of the previous two heatmaps. b): (left) plot of the $\text{NMI}$ computed at each value of the Markov time $t$, between the partition obtained by Flow Stability with $r=0.05$ and the original partition of the SBM (left y-axis, in blue) and plot of the NVI computed between the partitions obtained via FS at two consecutive Markov times $t$ and $t+1$ (right y-axis, in green); (right) plot of the $\text{NMI}$ computed at each value of the Markov time $t$, between the partition obtained by $\Delta$FS with $r=0.05$ and the original partition of the SBM (left y-axis, in orange) and plot of the NVI computed between the partitions obtained via $\Delta$FS at two consecutive Markov times $t$ and $t+1$ (right y-axis, in red). c): heatmap of NVI computed between each pair of partitions obtained with FS (left), or $\Delta$FS (right), at different Markov times $t$ and $t^*$, both for $r=0.05$.
  • Figure 4: Recovering the community structure of an ensemble of stochastic block models: Left - heatmap of the maximum value of the NMI between the partition obtained with FS and the original partition at each Markov time for each pair of values of $(p_\text{in},p_\text{out})$ for several values of the fraction of randomly removed edges $r$. Middle - heatmap of the maximum value of the NMI between the partition obtained with $\Delta$FS and the original partition at each Markov time for each pair of values of $(p_\text{in},p_\text{out})$ for several values of the fraction of randomly removed edges $r$. Right - difference between the left and middle columns.
  • Figure 5: Flow stability clustering of the Telegram channels network.a): the curves in the plot correspond to the number of clusters $n_c^\Delta(t)$ (top, in dark blue) found with $\Delta$FS at each Markov time, t; the Normalized Variation of Information between two partitions obtained at adjacent Markov times with $\Delta$FS, $\text{NVI}^\Delta(t,t+1)$ (bottom, in red). b) Nodes represent the communities, and edges represent the number of links between channels in each community. Upstream communities are shown in blue. Core communities are in brown, and downstream communities in purple. The labels indicate the rank of each community in terms of their size.
  • ...and 3 more figures