How many asymmetric communities are there in multi-layer directed networks?

Huan Qing

How many asymmetric communities are there in multi-layer directed networks?

Huan Qing

TL;DR

This work proposes a novel goodness-of-fit test that develops a sequential testing procedure that searches through candidate pairs of sender and receiver community numbers in a lexicographic order and proposes a ratio-based variant algorithm, which detects sharp changes in the sequence of test statistics by comparing consecutive candidates.

Abstract

Estimating the asymmetric numbers of communities in multi-layer directed networks is a challenging problem due to the multi-layer structures and inherent directional asymmetry, leading to possibly different numbers of sender and receiver communities. This work addresses this issue under the multi-layer stochastic co-block model, a model for multi-layer directed networks with distinct community structures in sending and receiving sides, by proposing a novel goodness-of-fit test. The test statistic relies on the deviation of the largest singular value of an aggregated normalized residual matrix from the constant 2. The test statistic exhibits a sharp dichotomy: Under the null hypothesis of correct model specification, its upper bound converges to zero with high probability; under underfitting, the test statistic itself diverges to infinity. With this property, we develop a sequential testing procedure that searches through candidate pairs of sender and receiver community numbers in a lexicographic order. The process stops at the smallest such pair where the test statistic drops below a decaying threshold. For robustness, we also propose a ratio-based variant algorithm, which detects sharp changes in the sequence of test statistics by comparing consecutive candidates. Both methods are proven to consistently determine the true numbers of sender and receiver communities under the multi-layer stochastic co-block model.

How many asymmetric communities are there in multi-layer directed networks?

TL;DR

Abstract

Paper Structure (32 sections, 12 theorems, 247 equations, 2 figures, 4 tables, 3 algorithms)

This paper contains 32 sections, 12 theorems, 247 equations, 2 figures, 4 tables, 3 algorithms.

Introduction
Model and problem stepup
Multi-layer stochastic co-block model (ML-ScBM)
Problem statement: joint estimation of asymmetric community numbers
Technical assumptions
A spectral-based goodness-of-fit test
Oracle test statistic and its asymptotics
Practical test statistic and its theoretical guarantees
Sequential testing algorithms for asymmetric community numbers selection
The MLDiGoF algorithm and its estimation consistency
A ratio-based variant: the MLRDiGoF algorithm
Numerical Experiments
Experiment 1: Behavior of test statistic $\hat{T}_n$ under null and alternative hypotheses
Experiment 2: Statistical discrimination power and robustness analysis
Experiment 3: Estimation accuracy under varied network sizes and sparsity levels
...and 17 more sections

Key Result

Lemma 1

When Assumptions assump:a1 and assump:a3 hold, for any $\epsilon > 0$, we have

Figures (2)

Figure 1: Sensitivity analysis of MLDiGoF and MLRDiGoF to threshold parameters ($n=800$, $L=15$, $\rho=0.2$, $(K_s,K_r)=(3,5)$). For each method, the $x$-axis index $1$ to $10$ corresponds to an increasing sequence of the associated threshold parameter: $\varepsilon$ for MLDiGoF (blue circles), constant $\tau$ for MLRDiGoF (red squares), and scale factor $a$ for MLRDiGoF with $\tau_n = a\log n$ (yellow triangles). Accuracy is computed over independent replications.
Figure 2: Test statistic $\hat{T}_n(m)$ and ratio statistic $r_m$ for ordered candidate pairs $1 \leq m \leq 100$ (i.e., $K_{\mathrm{cand}}=10$) for the FAO network. The red circle in the right panel marks the global maximum of $r_m$ at $m=42$, corresponding to the candidate pair $(k_s,k_r)=(6,4)$.

Theorems & Definitions (31)

Definition 1: Multi-layer stochastic co-block model (ML-ScBM)
Lemma 1: Asymptotic behavior of $T_n$
Theorem 1: Asymptotic behavior of $\hat{T}_n$ under $H_0$
Theorem 2: Asymptotic behavior of $\hat{T}_n$ under $H_1$
Remark 1
Example 1
Remark 2
Theorem 3: Consistency of the MLDiGoF algorithm
Remark 3: Interpretation of conditions
Theorem 4: Asymptotic behavior of $r_m$
...and 21 more

How many asymmetric communities are there in multi-layer directed networks?

TL;DR

Abstract

How many asymmetric communities are there in multi-layer directed networks?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (31)