Table of Contents
Fetching ...

How many asymmetric communities are there in multi-layer directed networks?

Huan Qing

TL;DR

This work proposes a novel goodness-of-fit test that develops a sequential testing procedure that searches through candidate pairs of sender and receiver community numbers in a lexicographic order and proposes a ratio-based variant algorithm, which detects sharp changes in the sequence of test statistics by comparing consecutive candidates.

Abstract

Estimating the asymmetric numbers of communities in multi-layer directed networks is a challenging problem due to the multi-layer structures and inherent directional asymmetry, leading to possibly different numbers of sender and receiver communities. This work addresses this issue under the multi-layer stochastic co-block model, a model for multi-layer directed networks with distinct community structures in sending and receiving sides, by proposing a novel goodness-of-fit test. The test statistic relies on the deviation of the largest singular value of an aggregated normalized residual matrix from the constant 2. The test statistic exhibits a sharp dichotomy: Under the null hypothesis of correct model specification, its upper bound converges to zero with high probability; under underfitting, the test statistic itself diverges to infinity. With this property, we develop a sequential testing procedure that searches through candidate pairs of sender and receiver community numbers in a lexicographic order. The process stops at the smallest such pair where the test statistic drops below a decaying threshold. For robustness, we also propose a ratio-based variant algorithm, which detects sharp changes in the sequence of test statistics by comparing consecutive candidates. Both methods are proven to consistently determine the true numbers of sender and receiver communities under the multi-layer stochastic co-block model.

How many asymmetric communities are there in multi-layer directed networks?

TL;DR

This work proposes a novel goodness-of-fit test that develops a sequential testing procedure that searches through candidate pairs of sender and receiver community numbers in a lexicographic order and proposes a ratio-based variant algorithm, which detects sharp changes in the sequence of test statistics by comparing consecutive candidates.

Abstract

Estimating the asymmetric numbers of communities in multi-layer directed networks is a challenging problem due to the multi-layer structures and inherent directional asymmetry, leading to possibly different numbers of sender and receiver communities. This work addresses this issue under the multi-layer stochastic co-block model, a model for multi-layer directed networks with distinct community structures in sending and receiving sides, by proposing a novel goodness-of-fit test. The test statistic relies on the deviation of the largest singular value of an aggregated normalized residual matrix from the constant 2. The test statistic exhibits a sharp dichotomy: Under the null hypothesis of correct model specification, its upper bound converges to zero with high probability; under underfitting, the test statistic itself diverges to infinity. With this property, we develop a sequential testing procedure that searches through candidate pairs of sender and receiver community numbers in a lexicographic order. The process stops at the smallest such pair where the test statistic drops below a decaying threshold. For robustness, we also propose a ratio-based variant algorithm, which detects sharp changes in the sequence of test statistics by comparing consecutive candidates. Both methods are proven to consistently determine the true numbers of sender and receiver communities under the multi-layer stochastic co-block model.
Paper Structure (32 sections, 12 theorems, 247 equations, 2 figures, 4 tables, 3 algorithms)

This paper contains 32 sections, 12 theorems, 247 equations, 2 figures, 4 tables, 3 algorithms.

Key Result

Lemma 1

When Assumptions assump:a1 and assump:a3 hold, for any $\epsilon > 0$, we have

Figures (2)

  • Figure 1: Sensitivity analysis of MLDiGoF and MLRDiGoF to threshold parameters ($n=800$, $L=15$, $\rho=0.2$, $(K_s,K_r)=(3,5)$). For each method, the $x$-axis index $1$ to $10$ corresponds to an increasing sequence of the associated threshold parameter: $\varepsilon$ for MLDiGoF (blue circles), constant $\tau$ for MLRDiGoF (red squares), and scale factor $a$ for MLRDiGoF with $\tau_n = a\log n$ (yellow triangles). Accuracy is computed over independent replications.
  • Figure 2: Test statistic $\hat{T}_n(m)$ and ratio statistic $r_m$ for ordered candidate pairs $1 \leq m \leq 100$ (i.e., $K_{\mathrm{cand}}=10$) for the FAO network. The red circle in the right panel marks the global maximum of $r_m$ at $m=42$, corresponding to the candidate pair $(k_s,k_r)=(6,4)$.

Theorems & Definitions (31)

  • Definition 1: Multi-layer stochastic co-block model (ML-ScBM)
  • Lemma 1: Asymptotic behavior of $T_n$
  • Theorem 1: Asymptotic behavior of $\hat{T}_n$ under $H_0$
  • Theorem 2: Asymptotic behavior of $\hat{T}_n$ under $H_1$
  • Remark 1
  • Example 1
  • Remark 2
  • Theorem 3: Consistency of the MLDiGoF algorithm
  • Remark 3: Interpretation of conditions
  • Theorem 4: Asymptotic behavior of $r_m$
  • ...and 21 more