Table of Contents
Fetching ...

Variational Estimators for Node Popularity Models

Jony Karki, Dongzhou Huang, Yunpeng Zhao

TL;DR

This work develops a likelihood-based variational EM framework for the Two-Way Node Popularity Model (TNPM) to detect communities in bipartite and related networks. By restricting the posterior over latent labels to a factorized form, it yields scalable E-step updates and closed-form M-step equations for node popularity parameters, together with principled initialization. The authors prove identifiability and label consistency under explicit conditions and demonstrate through simulations and real-data applications that the method outperforms the existing TSDC algorithm, particularly in sparse or heterogeneous settings. The approach provides a principled, robust tool for biclustering and community detection in complex network data with node-specific popularities.

Abstract

Node popularity is recognized as a key factor in modeling real-world networks, capturing heterogeneity in connectivity across communities. This concept is equally important in bipartite networks, where nodes in different partitions may exhibit varying popularity patterns, motivating models such as the Two-Way Node Popularity Model (TNPM). Existing methods, such as the Two-Stage Divided Cosine (TSDC) algorithm, provide a scalable estimation approach but may have limitations in terms of accuracy or applicability across different types of networks. In this paper, we develop a computationally efficient and theoretically justified variational expectation-maximization (VEM) framework for the TNPM. We establish label consistency for the estimated community assignments produced by the proposed variational estimator in bipartite networks. Through extensive simulation studies, we show that our method achieves superior estimation accuracy across a range of bipartite as well as undirected networks compared to existing algorithms. Finally, we evaluate our method on real-world bipartite and undirected networks, further demonstrating its practical effectiveness and robustness.

Variational Estimators for Node Popularity Models

TL;DR

This work develops a likelihood-based variational EM framework for the Two-Way Node Popularity Model (TNPM) to detect communities in bipartite and related networks. By restricting the posterior over latent labels to a factorized form, it yields scalable E-step updates and closed-form M-step equations for node popularity parameters, together with principled initialization. The authors prove identifiability and label consistency under explicit conditions and demonstrate through simulations and real-data applications that the method outperforms the existing TSDC algorithm, particularly in sparse or heterogeneous settings. The approach provides a principled, robust tool for biclustering and community detection in complex network data with node-specific popularities.

Abstract

Node popularity is recognized as a key factor in modeling real-world networks, capturing heterogeneity in connectivity across communities. This concept is equally important in bipartite networks, where nodes in different partitions may exhibit varying popularity patterns, motivating models such as the Two-Way Node Popularity Model (TNPM). Existing methods, such as the Two-Stage Divided Cosine (TSDC) algorithm, provide a scalable estimation approach but may have limitations in terms of accuracy or applicability across different types of networks. In this paper, we develop a computationally efficient and theoretically justified variational expectation-maximization (VEM) framework for the TNPM. We establish label consistency for the estimated community assignments produced by the proposed variational estimator in bipartite networks. Through extensive simulation studies, we show that our method achieves superior estimation accuracy across a range of bipartite as well as undirected networks compared to existing algorithms. Finally, we evaluate our method on real-world bipartite and undirected networks, further demonstrating its practical effectiveness and robustness.

Paper Structure

This paper contains 19 sections, 10 theorems, 116 equations, 2 figures, 1 algorithm.

Key Result

Proposition 1

For any fixed $q_1$ and $q_2$, $\hat{\Phi}=(\hat{\pi},\hat{{\rho}},\hat{{\theta}},\hat{{\lambda}})$ satisfying the following equations is a global maximizer of $J(q_1,q_2,\Phi)$.

Figures (2)

  • Figure 1: Comparison of ARI across different values of the density factor $r$.
  • Figure 2: Comparison of ARI across different values of the homophily factor $h$.

Theorems & Definitions (20)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 1
  • Proposition 4
  • Remark 1
  • Theorem 2
  • Remark 2
  • Theorem 3
  • Theorem 4
  • ...and 10 more