Table of Contents
Fetching ...

Two-way Node Popularity Model for Directed and Bipartite Networks

Bing-Yi Jing, Ting Li, Jiangzhou Wang, Ya Wang

TL;DR

TNPM extends community detection to directed and bipartite networks by modeling edge means as $P_{ij} = E(A_{ij}) = \Lambda(i,z_j) \widetilde{\Lambda}(j,c_i)$, yielding a block rank-one mean structure. The authors develop the Delete-One-Method (DOM) and the Two-Stage Divided Cosine Algorithm (TSDC) to fit TNPM and identify communities with unknown $K$ and $L$, accommodating sub-Gaussian edge distributions. They prove identifiability under mild assumptions and establish consistency of both the probability estimator and community detection, with finite-sample bounds. Empirical results on synthetic data and two real datasets—the Worldwide Food Trading Networks and MovieLens 100K—demonstrate improved accuracy and scalability, and reveal interpretable, domain-relevant structure.

Abstract

There has been extensive research on community detection in directed and bipartite networks. However, these studies often fail to consider the popularity of nodes in different communities, which is a common phenomenon in real-world networks. To address this issue, we propose a new probabilistic framework called the Two-Way Node Popularity Model (TNPM). The TNPM also accommodates edges from different distributions within a general sub-Gaussian family. We introduce the Delete-One-Method (DOM) for model fitting and community structure identification, and provide a comprehensive theoretical analysis with novel technical skills dealing with sub-Gaussian generalization. Additionally, we propose the Two-Stage Divided Cosine Algorithm (TSDC) to handle large-scale networks more efficiently. Our proposed methods offer multi-folded advantages in terms of estimation accuracy and computational efficiency, as demonstrated through extensive numerical studies. We apply our methods to two real-world applications, uncovering interesting findings.

Two-way Node Popularity Model for Directed and Bipartite Networks

TL;DR

TNPM extends community detection to directed and bipartite networks by modeling edge means as , yielding a block rank-one mean structure. The authors develop the Delete-One-Method (DOM) and the Two-Stage Divided Cosine Algorithm (TSDC) to fit TNPM and identify communities with unknown and , accommodating sub-Gaussian edge distributions. They prove identifiability under mild assumptions and establish consistency of both the probability estimator and community detection, with finite-sample bounds. Empirical results on synthetic data and two real datasets—the Worldwide Food Trading Networks and MovieLens 100K—demonstrate improved accuracy and scalability, and reveal interpretable, domain-relevant structure.

Abstract

There has been extensive research on community detection in directed and bipartite networks. However, these studies often fail to consider the popularity of nodes in different communities, which is a common phenomenon in real-world networks. To address this issue, we propose a new probabilistic framework called the Two-Way Node Popularity Model (TNPM). The TNPM also accommodates edges from different distributions within a general sub-Gaussian family. We introduce the Delete-One-Method (DOM) for model fitting and community structure identification, and provide a comprehensive theoretical analysis with novel technical skills dealing with sub-Gaussian generalization. Additionally, we propose the Two-Stage Divided Cosine Algorithm (TSDC) to handle large-scale networks more efficiently. Our proposed methods offer multi-folded advantages in terms of estimation accuracy and computational efficiency, as demonstrated through extensive numerical studies. We apply our methods to two real-world applications, uncovering interesting findings.

Paper Structure

This paper contains 16 sections, 3 theorems, 44 equations, 8 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Under the TNPM, assuming that Assumptions A1$\sim$A3 hold, we consider the following optimization problem with where $\bm {\bm c}$,$\bm {\bm z}$ represent the clustering vectors. Then, we have $\hat{{\bm c}} \equiv{\bm c}^{\ast}\;\mathrm{and}\; \hat{{\bm z}} \equiv{\bm z}^{\ast}\nonumber$, where $\bm c^{\ast}$ and $\bm z^{\ast}$ are the ground truth community structures, and $\equiv$ indicates t

Figures (8)

  • Figure 1: The adjacency matrix of MovieLens 100K Data set is rearranged by the clustering results obtained through the proposed TSDC method, with blue lines marking the cluster boundaries.
  • Figure 2: The NMI for Normal data generation case with $(n,m)=(600,600)$. The left panel depicts out-community clustering, while the right panel shows in-community clustering.
  • Figure 3: The NMI for Normal-Bernoulli mixture data generation case. The left panel depicts out-community clustering, while the right panel shows in-community clustering.
  • Figure 4: The NMI for sparsity data generation case. The left panel depicts out-community clustering, while the right panel shows in-community clustering.
  • Figure 5: $A1$ displays the original cereal trading network, while $A2$ and $A3$ represent the block cosine similarity matrix for the rows and columns, respectively, with nodes ordering based on detected clustering labels.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Theorem 3