Generative Data Augmentation in Graph Contrastive Learning for Recommendation
Yansong Wang, Qihui Lin, Junjie Huang, Tao Jia
TL;DR
The paper tackles sparsity in recommendation by marrying graph contrastive learning with generative data augmentation. It introduces GDA4Rec, which uses a deep generative noise module to produce adaptive augmented views and derives an item complement matrix from user-item interactions to provide additional self-supervised signals, forming the objective $\mathcal{L}_{aug}=\mathcal{L}_{recon}+\mathcal{L}_{ddl}$. The framework integrates a LightGCN backbone with multi-view generation and multi-pair contrast, optimized via $\mathcal{L}=\mathcal{L}_{rec}+\lambda\mathcal{L}_{cl}+\mathcal{L}_{aug}+\mathcal{L}_{reg}$ to learn informative embeddings. Experiments on three public datasets show consistent gains over strong baselines, particularly in sparse settings, and ablation confirms the contributions of the generative augmentation and item complementarities. The work advances practical self-supervised signals for recommendation and provides code for reproducibility.
Abstract
Recommendation systems have become indispensable in various online platforms, from e-commerce to streaming services. A fundamental challenge in this domain is learning effective embeddings from sparse user-item interactions. While contrastive learning has recently emerged as a promising solution to this issue, generating augmented views for contrastive learning through most existing random data augmentation methods often leads to the alteration of original semantic information. In this paper, we propose a novel framework, GDA4Rec (Generative Data Augmentation in graph contrastive learning for Recommendation) to generate high-quality augmented views and provide robust self-supervised signals. Specifically, we employ a noise generation module that leverages deep generative models to approximate the distribution of original data for data augmentation. Additionally, GDA4Rec further extracts an item complement matrix to characterize the latent correlations between items and provide additional self-supervised signals. Lastly, a joint objective that integrates recommendation, data augmentation and contrastive learning is used to enforce the model to learn more effective and informative embeddings. Extensive experiments are conducted on three public datasets to demonstrate the superiority of the model. The code is available at: https://github.com/MrYansong/GDA4Rec.
