Table of Contents
Fetching ...

Graph Representation Learning via Causal Diffusion for Out-of-Distribution Recommendation

Chu Zhao, Enneng Yang, Yuliang Liang, Pengxiang Lan, Yuting Liu, Jianzhe Zhao, Guibing Guo, Xingwei Wang

TL;DR

We address OOD generalization in GNN-based recommender systems by showing environmental confounders lead to unstable correlations. We propose CausalDiffRec, which combines backdoor-adjusted inference, variational environment inference, and diffusion-based invariant representation learning to mitigate confounders. We provide theoretical analysis proving that optimizing the objective encourages environment-invariant representations and better OOD generalization, including the backdoor adjustment expression $P_\theta(Y|do(G)) = \mathbb{E}_{e \sim D_{tr}(E)}[P_\theta(Y|G,E,I)]$. Experiments on four real-world datasets show substantial improvements across multiple distribution shifts.

Abstract

Graph Neural Networks (GNNs)-based recommendation algorithms typically assume that training and testing data are drawn from independent and identically distributed (IID) spaces. However, this assumption often fails in the presence of out-of-distribution (OOD) data, resulting in significant performance degradation. In this study, we construct a Structural Causal Model (SCM) to analyze interaction data, revealing that environmental confounders (e.g., the COVID-19 pandemic) lead to unstable correlations in GNN-based models, thus impairing their generalization to OOD data. To address this issue, we propose a novel approach, graph representation learning via causal diffusion (CausalDiffRec) for OOD recommendation. This method enhances the model's generalization on OOD data by eliminating environmental confounding factors and learning invariant graph representations. Specifically, we use backdoor adjustment and variational inference to infer the real environmental distribution, thereby eliminating the impact of environmental confounders. This inferred distribution is then used as prior knowledge to guide the representation learning in the reverse phase of the diffusion process to learn the invariant representation. In addition, we provide a theoretical derivation that proves optimizing the objective function of CausalDiffRec can encourage the model to learn environment-invariant graph representations, thereby achieving excellent generalization performance in recommendations under distribution shifts. Our extensive experiments validate the effectiveness of CausalDiffRec in improving the generalization of OOD data, and the average improvement is up to 10.69% on Food, 18.83% on KuaiRec, 22.41% on Yelp2018, and 11.65% on Douban datasets.

Graph Representation Learning via Causal Diffusion for Out-of-Distribution Recommendation

TL;DR

We address OOD generalization in GNN-based recommender systems by showing environmental confounders lead to unstable correlations. We propose CausalDiffRec, which combines backdoor-adjusted inference, variational environment inference, and diffusion-based invariant representation learning to mitigate confounders. We provide theoretical analysis proving that optimizing the objective encourages environment-invariant representations and better OOD generalization, including the backdoor adjustment expression . Experiments on four real-world datasets show substantial improvements across multiple distribution shifts.

Abstract

Graph Neural Networks (GNNs)-based recommendation algorithms typically assume that training and testing data are drawn from independent and identically distributed (IID) spaces. However, this assumption often fails in the presence of out-of-distribution (OOD) data, resulting in significant performance degradation. In this study, we construct a Structural Causal Model (SCM) to analyze interaction data, revealing that environmental confounders (e.g., the COVID-19 pandemic) lead to unstable correlations in GNN-based models, thus impairing their generalization to OOD data. To address this issue, we propose a novel approach, graph representation learning via causal diffusion (CausalDiffRec) for OOD recommendation. This method enhances the model's generalization on OOD data by eliminating environmental confounding factors and learning invariant graph representations. Specifically, we use backdoor adjustment and variational inference to infer the real environmental distribution, thereby eliminating the impact of environmental confounders. This inferred distribution is then used as prior knowledge to guide the representation learning in the reverse phase of the diffusion process to learn the invariant representation. In addition, we provide a theoretical derivation that proves optimizing the objective function of CausalDiffRec can encourage the model to learn environment-invariant graph representations, thereby achieving excellent generalization performance in recommendations under distribution shifts. Our extensive experiments validate the effectiveness of CausalDiffRec in improving the generalization of OOD data, and the average improvement is up to 10.69% on Food, 18.83% on KuaiRec, 22.41% on Yelp2018, and 11.65% on Douban datasets.
Paper Structure (31 sections, 4 theorems, 43 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 31 sections, 4 theorems, 43 equations, 7 figures, 4 tables, 1 algorithm.

Key Result

proposition 1

Minimizing Eq. (eq: rewritten) promotes the model's adherence to the Invariance Property and the Sufficient Condition outlined in Assumption (in Sec. sec:Invariant).

Figures (7)

  • Figure 1: Left and Middle: An example illustrates the popularity distribution shift, i.e., how the popularity of masks, disinfectants, exercise equipment, and electronic products changes with the COVID-19 pandemic. Right: We constructed both IID and OOD sets on the Yelp2018 dataset and compared the performance of the LightGCN model he2020lightgcn on these datasets. We found a significant average performance drop (i.e., $29.03\%$) in OOD data across three metrics.
  • Figure 2: The structure causal model for GNN-based recommendation
  • Figure 3: Overall framework illustration of the proposed CausalDiffRec model.
  • Figure 4: Effects of the number of diffusion steps T.
  • Figure 5: Effects of the number of environments K.
  • ...and 2 more figures

Theorems & Definitions (4)

  • proposition 1
  • proposition 2
  • lemma 1
  • lemma 2