Table of Contents
Fetching ...

E2E-GRec: An End-to-End Joint Training Framework for Graph Neural Networks and Recommender Systems

Rui Xue, Shichao Zhu, Liang Qin, Guangmou Pan, Yang Song, Tianfu Wu

TL;DR

E2E-GRec tackles the inefficiencies and misalignment of traditional two-stage GNN-based recommender systems by enabling end-to-end joint training. It introduces subgraph sampling from a cross-domain heterogeneous graph, a Graph Feature Auto-Encoder (GFAE) SSL objective to guide GNN embeddings, and a two-level feature fusion with Gradnorm-based dynamic loss balancing to stabilize multi-task learning. The framework demonstrates superior offline performance and positive online A/B results, supported by theoretical analysis of gradient coupling and SSL-cascaded misalignment. Practically, E2E-GRec enables scalable, end-to-end optimization that leverages higher-order graph signals to improve recommendation quality and user engagement.

Abstract

Graph Neural Networks (GNNs) have emerged as powerful tools for modeling graph-structured data and have been widely used in recommender systems, such as for capturing complex user-item and item-item relations. However, most industrial deployments adopt a two-stage pipeline: GNNs are first pre-trained offline to generate node embeddings, which are then used as static features for downstream recommender systems. This decoupled paradigm leads to two key limitations: (1) high computational overhead, since large-scale GNN inference must be repeatedly executed to refresh embeddings; and (2) lack of joint optimization, as the gradient from the recommender system cannot directly influence the GNN learning process, causing the GNN to be suboptimally informative for the recommendation task. In this paper, we propose E2E-GRec, a novel end-to-end training framework that unifies GNN training with the recommender system. Our framework is characterized by three key components: (i) efficient subgraph sampling from a large-scale cross-domain heterogeneous graph to ensure training scalability and efficiency; (ii) a Graph Feature Auto-Encoder (GFAE) serving as an auxiliary self-supervised task to guide the GNN to learn structurally meaningful embeddings; and (iii) a two-level feature fusion mechanism combined with Gradnorm-based dynamic loss balancing, which stabilizes graph-aware multi-task end-to-end training. Extensive offline evaluations, online A/B tests (e.g., a +0.133% relative improvement in stay duration, a 0.3171% reduction in the average number of videos a user skips) on large-scale production data, together with theoretical analysis, demonstrate that E2E-GRec consistently surpasses traditional approaches, yielding significant gains across multiple recommendation metrics.

E2E-GRec: An End-to-End Joint Training Framework for Graph Neural Networks and Recommender Systems

TL;DR

E2E-GRec tackles the inefficiencies and misalignment of traditional two-stage GNN-based recommender systems by enabling end-to-end joint training. It introduces subgraph sampling from a cross-domain heterogeneous graph, a Graph Feature Auto-Encoder (GFAE) SSL objective to guide GNN embeddings, and a two-level feature fusion with Gradnorm-based dynamic loss balancing to stabilize multi-task learning. The framework demonstrates superior offline performance and positive online A/B results, supported by theoretical analysis of gradient coupling and SSL-cascaded misalignment. Practically, E2E-GRec enables scalable, end-to-end optimization that leverages higher-order graph signals to improve recommendation quality and user engagement.

Abstract

Graph Neural Networks (GNNs) have emerged as powerful tools for modeling graph-structured data and have been widely used in recommender systems, such as for capturing complex user-item and item-item relations. However, most industrial deployments adopt a two-stage pipeline: GNNs are first pre-trained offline to generate node embeddings, which are then used as static features for downstream recommender systems. This decoupled paradigm leads to two key limitations: (1) high computational overhead, since large-scale GNN inference must be repeatedly executed to refresh embeddings; and (2) lack of joint optimization, as the gradient from the recommender system cannot directly influence the GNN learning process, causing the GNN to be suboptimally informative for the recommendation task. In this paper, we propose E2E-GRec, a novel end-to-end training framework that unifies GNN training with the recommender system. Our framework is characterized by three key components: (i) efficient subgraph sampling from a large-scale cross-domain heterogeneous graph to ensure training scalability and efficiency; (ii) a Graph Feature Auto-Encoder (GFAE) serving as an auxiliary self-supervised task to guide the GNN to learn structurally meaningful embeddings; and (iii) a two-level feature fusion mechanism combined with Gradnorm-based dynamic loss balancing, which stabilizes graph-aware multi-task end-to-end training. Extensive offline evaluations, online A/B tests (e.g., a +0.133% relative improvement in stay duration, a 0.3171% reduction in the average number of videos a user skips) on large-scale production data, together with theoretical analysis, demonstrate that E2E-GRec consistently surpasses traditional approaches, yielding significant gains across multiple recommendation metrics.

Paper Structure

This paper contains 28 sections, 3 theorems, 33 equations, 4 figures, 2 tables.

Key Result

Theorem 1

Let $h_\theta(\cdot;\mathcal{G})$ denote the GNN embeddings with parameters $\theta$, $z_i=[h_\theta(x_i;\mathcal{G}) \Vert b_i]$, $s_\psi$ be the recommendation scorer and Assume $\frac{\partial s_\psi(z)}{\partial h_\theta} \neq 0$. Then: (i) $\nabla_\theta J$ contains a nonzero contribution from $L_{\mathrm{rec}}$ (Rec → GNN); (ii) $\frac{\partial}{\partial \theta}(\nabla_\psi J) \neq 0$, i.e.

Figures (4)

  • Figure 1: Overview of E2E-GRec . Colors indicate distinct functional blocks. (1) Subgraph sampling: Given a source item ID, we sample $k$-hop neighbors to form a subgraph (Sec. \ref{['sec:graph']}); (2) GNN SSL: GNN trained via a graph autoencoder (Sec. \ref{['sec:ssl']}); (3) GNN–LTR fusion: GNN representations are integrated and jointly optimized with LTR in an end-to-end manner. (Sec. \ref{['sec:combine']})
  • Figure 2: Ablation on GNN backbones and fusion.
  • Figure 3: Ablation on the effect of Gradnorm.
  • Figure 4: Dynamic Weight assigned by Gradnorm

Theorems & Definitions (3)

  • Theorem 1: Gradient Coupling in E2E-GRec
  • Theorem 2: SSL vs. Cascaded: Objective Misalignment
  • Theorem 2: SSL vs. Cascaded Ranking Head: Objective Misalignment