Table of Contents
Fetching ...

Detectability threshold in weighted modular networks

Filippo Radicchi, Filipi N. Silva, Alessandro Flammini, Santo Fortunato, Sadamori Kojaku

TL;DR

This work analytically derive a general expression for the maximum level of mixing tolerated by the algorithm to retrieve community structure, showing that the value of this detectability threshold depends on the first two moments of the distributions of node degree and edge weight.

Abstract

We study the necessary condition to detect, by means of spectral modularity optimization, the ground-truth partition in networks generated according to the weighted planted-partition model with two equally sized communities. We analytically derive a general expression for the maximum level of mixing tolerated by the algorithm to retrieve community structure, showing that the value of this detectability threshold depends on the first two moments of the distributions of node degree and edge weight. We focus on the standard case of Poisson-distributed node degrees and compare the detectability thresholds of five edge-weight distributions: Dirac, Poisson, exponential, geometric, and signed Bernoulli. We show that Dirac distributed weights yield the smallest detectability threshold, while exponentially distributed weights increase the threshold by a factor $\sqrt{2}$, with other distributions exhibiting distinct behaviors that depend, either or both, on the average values of the degree and weight distributions. Our results indicate that larger variability in edge weights can make communities less detectable. In cases where edge weights carry no information about community structure, incorporating weights in community detection is detrimental.

Detectability threshold in weighted modular networks

TL;DR

This work analytically derive a general expression for the maximum level of mixing tolerated by the algorithm to retrieve community structure, showing that the value of this detectability threshold depends on the first two moments of the distributions of node degree and edge weight.

Abstract

We study the necessary condition to detect, by means of spectral modularity optimization, the ground-truth partition in networks generated according to the weighted planted-partition model with two equally sized communities. We analytically derive a general expression for the maximum level of mixing tolerated by the algorithm to retrieve community structure, showing that the value of this detectability threshold depends on the first two moments of the distributions of node degree and edge weight. We focus on the standard case of Poisson-distributed node degrees and compare the detectability thresholds of five edge-weight distributions: Dirac, Poisson, exponential, geometric, and signed Bernoulli. We show that Dirac distributed weights yield the smallest detectability threshold, while exponentially distributed weights increase the threshold by a factor , with other distributions exhibiting distinct behaviors that depend, either or both, on the average values of the degree and weight distributions. Our results indicate that larger variability in edge weights can make communities less detectable. In cases where edge weights carry no information about community structure, incorporating weights in community detection is detrimental.

Paper Structure

This paper contains 4 sections, 113 equations, 5 figures.

Figures (5)

  • Figure 1: (a) Detectability threshold $\Delta_k^*$ for the weighted planted partition model as a function of $W$, i.e., the sum of average value of the within- and cross-community edge weights. Here $N=1024$ and $K=128$. Different curves corresponds to the different distributions of the weights we consider. Predictions are given by Eq. (\ref{['eq:dirac_det']}) for the Dirac distribution, Eq. (\ref{['eq:poisson_det']}) for the Poisson distribution, Eq. (\ref{['eq:expon_det']}) for the exponential distribution, Eq. (\ref{['eq:geometric_det']}) for the geometric distribution, and Eq. (\ref{['eq:sign_det']}) for the signed-Bernoulli distribution. Note that geometric weights require $W \geq 2$ and signed-Bernoulli weights require $0 < W \leq 2$, explaining the limited range of their corresponding curves. (b) Detectability threshold $\Delta_{w}^*$ for the weighted planted partition model as a function of $W$, with the same parameters as in (a) but for the case of homogeneous topology, given by Eq. (\ref{['eq:poisson_w_det']}) for the Poisson distribution, Eq. (\ref{['eq:dirac_w_det']}) for the Dirac distribution, Eq. (\ref{['eq:expon_w_det']}) for the exponential distribution, Eq. (\ref{['eq:geometric_w_det']}) for the geometric distribution, and Eq. (\ref{['eq:sign_w_det']}) for the signed-Bernoulli distribution. The limited range of geometric and signed-Bernoulli weight curves reflects their respective parameter constraints.
  • Figure 2: (a) Largest eigenvalue $\lambda$ of the modularity matrix as a function of the difference between the expected value of the within- and cross-community degrees $\Delta_k$ for the weighted planted-partition model. Here, $N=1024$, $K=128$, $W = 1$. Different symbols/colors correspond to different distributions of the edge weights. Numerical results are averaged over $100$ realizations of the model for each $\Delta_k$ value and displayed with symbols. The solid curves represent the theoretical predictions, whereas the vertical dotted lines denote the detectability thresholds. Note that some distributions are not shown for certain $W$ values due to parameter compatibility constraints (e.g., geometric weights require $W \geq 2$, signed-Bernoulli weights require $0 < W \leq 2$). (b) We plot the order parameter $P$ (i.e., Eq. (\ref{['eq:order']})) as a function of $\Delta_k$ for the same networks as in (a). (c) Same as in (a), but for $W = 10$. (d) Same as in (b), but for $W = 10$. (e) Same as in (a), but for $W = 100$. (f) Same as in (b), but for $W = 100$.
  • Figure 3: (a) Largest eigenvalue $\lambda$ of the modularity matrix as a function of the difference between the expected value of the within- and cross-community edge weights $\Delta_{w}$ for the weighted planted-partition model. Here, $N=1024$, $K=128$, $W = 1$. Different symbols/colors correspond to different distributions of the edge weights. Numerical results are averaged over $100$ realizations of the model for each $\Delta_{w}$ value and displayed with symbols. The solid curves represent the theoretical predictions, whereas the vertical dotted lines denote the detectability thresholds. Missing curves for certain $W$ values reflect parameter compatibility constraints (e.g., geometric weights require $W \geq 2$, while signed-Bernoulli weights are limited to $0 < W \leq 2$). (b) We plot the order parameter $P$ [i.e., Eq. (\ref{['eq:order']})] as a function of $\Delta_w$ for the same networks as in (a). (c) Same as in (a), but for $W = 2$. (d) Same as in (b), but for $W = 2$. (e) Same as in (a), but for $W = 4$. (f) Same as in (b), but for $W = 4$.
  • Figure 4: Community detection performance using the Leiden algorithm as a function of $\Delta_k$ for $\langle w_{\text{in}} \rangle = \langle w_{\text{out}} \rangle$. Top row: $q=2$ communities; bottom row: $q=16$ communities. Each line represents a different weight distribution. Networks have $N=1024$ nodes and average degree $K=128$. Panels show different total weights: (a,b) $W=1$; (c,d) $W=2$; (e,f) $W=4$.
  • Figure 5: Community detection performance using the Leiden algorithm as a function of $\Delta_w$ for $\langle k_{\text{in}} \rangle = \langle k_{\text{out}} \rangle$. Top row: $q=2$ communities; bottom row: $q=16$ communities. Each line represents a different weight distribution. Networks have $N=1024$ nodes and average degree $K=128$. Panels show different total weights: (a,b) $W=1$; (c,d) $W=2$; (e,f) $W=4$.