E2E-GRec: An End-to-End Joint Training Framework for Graph Neural Networks and Recommender Systems
Rui Xue, Shichao Zhu, Liang Qin, Guangmou Pan, Yang Song, Tianfu Wu
TL;DR
E2E-GRec tackles the inefficiencies and misalignment of traditional two-stage GNN-based recommender systems by enabling end-to-end joint training. It introduces subgraph sampling from a cross-domain heterogeneous graph, a Graph Feature Auto-Encoder (GFAE) SSL objective to guide GNN embeddings, and a two-level feature fusion with Gradnorm-based dynamic loss balancing to stabilize multi-task learning. The framework demonstrates superior offline performance and positive online A/B results, supported by theoretical analysis of gradient coupling and SSL-cascaded misalignment. Practically, E2E-GRec enables scalable, end-to-end optimization that leverages higher-order graph signals to improve recommendation quality and user engagement.
Abstract
Graph Neural Networks (GNNs) have emerged as powerful tools for modeling graph-structured data and have been widely used in recommender systems, such as for capturing complex user-item and item-item relations. However, most industrial deployments adopt a two-stage pipeline: GNNs are first pre-trained offline to generate node embeddings, which are then used as static features for downstream recommender systems. This decoupled paradigm leads to two key limitations: (1) high computational overhead, since large-scale GNN inference must be repeatedly executed to refresh embeddings; and (2) lack of joint optimization, as the gradient from the recommender system cannot directly influence the GNN learning process, causing the GNN to be suboptimally informative for the recommendation task. In this paper, we propose E2E-GRec, a novel end-to-end training framework that unifies GNN training with the recommender system. Our framework is characterized by three key components: (i) efficient subgraph sampling from a large-scale cross-domain heterogeneous graph to ensure training scalability and efficiency; (ii) a Graph Feature Auto-Encoder (GFAE) serving as an auxiliary self-supervised task to guide the GNN to learn structurally meaningful embeddings; and (iii) a two-level feature fusion mechanism combined with Gradnorm-based dynamic loss balancing, which stabilizes graph-aware multi-task end-to-end training. Extensive offline evaluations, online A/B tests (e.g., a +0.133% relative improvement in stay duration, a 0.3171% reduction in the average number of videos a user skips) on large-scale production data, together with theoretical analysis, demonstrate that E2E-GRec consistently surpasses traditional approaches, yielding significant gains across multiple recommendation metrics.
