Table of Contents
Fetching ...

Cross-Validation in Bipartite Networks

Bokai Yang, Yuanxing Chen, Yuhong Yang

Abstract

Although network data have become increasingly popular and widely studied, the vast majority of statistical literature has focused on unipartite networks, leaving relatively few theoretical results for bipartite networks. In this paper, we study the model selection problem for bipartite stochastic block models. We propose a penalized cross-validation approach that incorporates appropriate penalty terms for different candidate models, addressing the new and challenging issue that underfitting may occur on one side while overfitting occurs on the other. To the best of our knowledge, our method provides the first consistency guarantee for model selection in bipartite networks. Through simulations under various scenarios and analysis of two real datasets, we demonstrate that our approach not only outperforms traditional modularity-based and projection-based methods, but also naturally preserves potential asymmetry between the two node sets.

Cross-Validation in Bipartite Networks

Abstract

Although network data have become increasingly popular and widely studied, the vast majority of statistical literature has focused on unipartite networks, leaving relatively few theoretical results for bipartite networks. In this paper, we study the model selection problem for bipartite stochastic block models. We propose a penalized cross-validation approach that incorporates appropriate penalty terms for different candidate models, addressing the new and challenging issue that underfitting may occur on one side while overfitting occurs on the other. To the best of our knowledge, our method provides the first consistency guarantee for model selection in bipartite networks. Through simulations under various scenarios and analysis of two real datasets, we demonstrate that our approach not only outperforms traditional modularity-based and projection-based methods, but also naturally preserves potential asymmetry between the two node sets.
Paper Structure (20 sections, 30 theorems, 143 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 30 theorems, 143 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

Assume that $\bar{B} = U \Sigma V^{\top}$ is the reduced SVD of $\bar{B}$, where $U\in \mathbb{O}^{K_1 \times K}$, $V\in \mathbb{O}^{K_2 \times K}$, $\Sigma = \mathrm{diag}(\sigma_1, \dots, \sigma_K)>0$, and $K = \min\{K_1, K_2\}$. Then, is the reduced SVD of $P$ where $\bar{Z}_r := Z_r N_r^{-1/2}$ is itself a column orthogonal matrix.

Figures (4)

  • Figure 1: Results of Southern Women network by 10-fold BCV algorithm.
  • Figure 2: Estimated community structure for Southern women network.
  • Figure 3: Results of cosponsorship network by 10-fold BCV algorithm.
  • Figure 4: Top enriched committees in 6 representative bill communities. Each panel shows the two most overrepresented committees in the corresponding estimated cluster. Note that the average relative proportion of committee involved in the bills in each community is 0.727.

Theorems & Definitions (49)

  • Lemma 1: Lemma 1 of Zhou2019AnalysisOS
  • Remark 1
  • Remark 2
  • Theorem 1
  • Lemma A.1
  • Lemma A.2
  • Lemma A.3
  • Proposition A.1
  • proof : Proof of Proposition \ref{['prop_con']}
  • Lemma A.4
  • ...and 39 more