Table of Contents
Fetching ...

Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks

Yunpeng Zhao, Ning Hao, Ji Zhu

TL;DR

This paper introduces a degree-corrected latent block model (DC-LBM) for biclustering bipartite networks, incorporating row and column degree parameters $\theta_i$ and $\lambda_j$ so that the mean of $A_{ij}$ conditional on cluster labels is $\theta_i\lambda_j\mu_{z_i w_j}$. A variational EM algorithm with closed-form M-step updates is developed, enabling efficient estimation of all parameters and latent labels. The authors prove label consistency and a convergence rate for the variational estimator under Poisson and Bernoulli edges, allowing the graph density to vanish as long as average degrees diverge. Simulations and MovieLens data illustrate substantial improvements over non-degree-aware biclustering methods, highlighting the method’s robustness and practical impact for uncovering structured bipartite communities in real data.

Abstract

Bipartite graphs are ubiquitous across various scientific and engineering fields. Simultaneously grouping the two types of nodes in a bipartite graph via biclustering represents a fundamental challenge in network analysis for such graphs. The latent block model (LBM) is a commonly used model-based tool for biclustering. However, the effectiveness of the LBM is often limited by the influence of row and column sums in the data matrix. To address this limitation, we introduce the degree-corrected latent block model (DC-LBM), which accounts for the varying degrees in row and column clusters, significantly enhancing performance on real-world data sets and simulated data. We develop an efficient variational expectation-maximization algorithm by creating closed-form solutions for parameter estimates in the M steps. Furthermore, we prove the label consistency and the rate of convergence of the variational estimator under the DC-LBM, allowing the expected graph density to approach zero as long as the average expected degrees of rows and columns approach infinity when the size of the graph increases.

Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks

TL;DR

This paper introduces a degree-corrected latent block model (DC-LBM) for biclustering bipartite networks, incorporating row and column degree parameters and so that the mean of conditional on cluster labels is . A variational EM algorithm with closed-form M-step updates is developed, enabling efficient estimation of all parameters and latent labels. The authors prove label consistency and a convergence rate for the variational estimator under Poisson and Bernoulli edges, allowing the graph density to vanish as long as average degrees diverge. Simulations and MovieLens data illustrate substantial improvements over non-degree-aware biclustering methods, highlighting the method’s robustness and practical impact for uncovering structured bipartite communities in real data.

Abstract

Bipartite graphs are ubiquitous across various scientific and engineering fields. Simultaneously grouping the two types of nodes in a bipartite graph via biclustering represents a fundamental challenge in network analysis for such graphs. The latent block model (LBM) is a commonly used model-based tool for biclustering. However, the effectiveness of the LBM is often limited by the influence of row and column sums in the data matrix. To address this limitation, we introduce the degree-corrected latent block model (DC-LBM), which accounts for the varying degrees in row and column clusters, significantly enhancing performance on real-world data sets and simulated data. We develop an efficient variational expectation-maximization algorithm by creating closed-form solutions for parameter estimates in the M steps. Furthermore, we prove the label consistency and the rate of convergence of the variational estimator under the DC-LBM, allowing the expected graph density to approach zero as long as the average expected degrees of rows and columns approach infinity when the size of the graph increases.
Paper Structure (21 sections, 16 theorems, 86 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 16 theorems, 86 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

For any fixed $q_1$ and $q_2$, $\hat{\Phi}=(\hat{{\pi}},\hat{{\rho}},\hat{{\theta}},\hat{{\lambda}},\hat{\mu})$ defined below is a global maximizer of $J(q_1,q_2,\Phi)$. Moreover, if $\mathbb{P}_{q_1}(z_i=k)\ne0$ for all $i$ and $k$, and $\mathbb{P}_{q_2}(w_j=l)\ne0$ for all $j$ and $l$, all maximizers are of the form $(\hat{{\pi}},\hat{{\rho}},e^{c_1}\hat{{\theta}},e^{c_2}\hat{{\lambda}},e^{-c_1

Figures (10)

  • Figure 1: Performance of three algorithms under the DC-LBM with the Poisson distribution. $r$: the graph density factor in \ref{['mu_simulation']}. Left panel: detection of row clusters. Right panel: detection of column clusters. SC: spectral clustering. PL: profile likelihood based biclustering.
  • Figure 2: Performance of three algorithms under the classical LBM with the Poisson distribution. $r$: the graph density factor in \ref{['mu_simulation']}. The red curve and green curve are almost identical. Left panel: detection of row clusters. Right panel: detection of column clusters. SC: spectral clustering. PL: profile likelihood based biclustering.
  • Figure 3: CPU time for three algorithms under Poisson models. $r$: the graph density factor in \ref{['mu_simulation']}. Left panel: the true model is the classical LBM with the Poisson distribution. Right panel: the true model is the DC-LBM with the Poisson distribution.
  • Figure 4: Performance of three algorithms under the DC-LBM with the Bernoulli distribution. $r$: the graph density factor in \ref{['mu_simulation']}. Left panel: detection of row clusters. Right panel: detection of column clusters. SC: spectral clustering. PL: profile likelihood based biclustering.
  • Figure 5: Performance of three algorithms under the classical LBM with the Bernoulli distribution. $r$: the graph density factor in \ref{['mu_simulation']}. Left panel: detection of row clusters. Right panel: detection of column clusters. SC: spectral clustering. PL: profile likelihood based biclustering.
  • ...and 5 more figures

Theorems & Definitions (24)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Theorem 1
  • Definition 1: Soft confusion matrix
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • ...and 14 more