Table of Contents
Fetching ...

STERLING: Synergistic Representation Learning on Bipartite Graphs

Baoyu Jing, Yuchen Yan, Kaize Ding, Chanyoung Park, Yada Zhu, Huan Liu, Hanghang Tong

TL;DR

STERLING introduces a non-contrastive self-supervised learning framework tailored for bipartite graphs that preserves both local synergies (inter-type and intra-type positive pairs) and global synergies (co-cluster mutual information). It employs a bootstrapped online/target encoder, a projection head, and a co-clustering module to maximize I(K;L), which is proven to bound I(U_theta;V_theta) and thus improve cross-type connectivity in the embedding space. The approach avoids negative samples and demonstrates strong empirical performance across recommendation, link prediction, and co-clustering benchmarks, outperforming many contrastive and non-contrastive baselines. The work provides theoretical guarantees, practical insights, and a scalable framework for bipartite graph representation learning with potential impact on recommender systems and related domains.

Abstract

A fundamental challenge of bipartite graph representation learning is how to extract informative node embeddings. Self-Supervised Learning (SSL) is a promising paradigm to address this challenge. Most recent bipartite graph SSL methods are based on contrastive learning which learns embeddings by discriminating positive and negative node pairs. Contrastive learning usually requires a large number of negative node pairs, which could lead to computational burden and semantic errors. In this paper, we introduce a novel synergistic representation learning model (STERLING) to learn node embeddings without negative node pairs. STERLING preserves the unique local and global synergies in bipartite graphs. The local synergies are captured by maximizing the similarity of the inter-type and intra-type positive node pairs, and the global synergies are captured by maximizing the mutual information of co-clusters. Theoretical analysis demonstrates that STERLING could improve the connectivity between different node types in the embedding space. Extensive empirical evaluation on various benchmark datasets and tasks demonstrates the effectiveness of STERLING for extracting node embeddings.

STERLING: Synergistic Representation Learning on Bipartite Graphs

TL;DR

STERLING introduces a non-contrastive self-supervised learning framework tailored for bipartite graphs that preserves both local synergies (inter-type and intra-type positive pairs) and global synergies (co-cluster mutual information). It employs a bootstrapped online/target encoder, a projection head, and a co-clustering module to maximize I(K;L), which is proven to bound I(U_theta;V_theta) and thus improve cross-type connectivity in the embedding space. The approach avoids negative samples and demonstrates strong empirical performance across recommendation, link prediction, and co-clustering benchmarks, outperforming many contrastive and non-contrastive baselines. The work provides theoretical guarantees, practical insights, and a scalable framework for bipartite graph representation learning with potential impact on recommender systems and related domains.

Abstract

A fundamental challenge of bipartite graph representation learning is how to extract informative node embeddings. Self-Supervised Learning (SSL) is a promising paradigm to address this challenge. Most recent bipartite graph SSL methods are based on contrastive learning which learns embeddings by discriminating positive and negative node pairs. Contrastive learning usually requires a large number of negative node pairs, which could lead to computational burden and semantic errors. In this paper, we introduce a novel synergistic representation learning model (STERLING) to learn node embeddings without negative node pairs. STERLING preserves the unique local and global synergies in bipartite graphs. The local synergies are captured by maximizing the similarity of the inter-type and intra-type positive node pairs, and the global synergies are captured by maximizing the mutual information of co-clusters. Theoretical analysis demonstrates that STERLING could improve the connectivity between different node types in the embedding space. Extensive empirical evaluation on various benchmark datasets and tasks demonstrates the effectiveness of STERLING for extracting node embeddings.
Paper Structure (24 sections, 3 theorems, 24 equations, 4 figures, 7 tables)

This paper contains 24 sections, 3 theorems, 24 equations, 4 figures, 7 tables.

Key Result

Theorem 1

The mutual information $I(\mathbf{U}_\theta;\mathbf{V}_\theta)$ of embeddings $\mathbf{U}_\theta$ and $\mathbf{V}_\theta$ is lower-bounded by the mutual information of co-clusters $I(K;L)$:

Figures (4)

  • Figure 1: Example of a bipartite graph and its unique local and global properties. Locally, the dashed curves are implicit intra-type connections. Globally, the green line shows the inter-connection between the co-clusters, i.e., bio-engineer and electronic devices.
  • Figure 2: Overview of Sterling. $\mathcal{E}$, $\mathcal{P}$ and $\mathcal{C}$ are the encoder, projector and cluster network. $\theta$ and $\phi$ are parameters of the online and target networks. $\theta$ is updated by optimizing objectives, $\phi$ is updated via Exponential Moving Average (EMA) of $\theta$. For details, please refer to the methodology section.
  • Figure 3: (a-c) Sensitivity analysis and (d) convergence of $I(K;L)$ on the Cornell dataset.
  • Figure 4: (a-b) T-SNE visualization on Wiki (40%). (c-d) Visualization of noise filter on ML-100K.

Theorems & Definitions (5)

  • Theorem 1: Information Bound
  • Lemma 1: Variational Bound
  • proof
  • Theorem 2: Information Bound
  • proof