Table of Contents
Fetching ...

GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation

Jun Zhang, Yi Li, Yue Liu, Changping Wang, Yuan Wang, Yuling Xiong, Xun Liu, Haiyang Wu, Qian Li, Enming Zhang, Jiawei Sun, Xin Xu, Zishuai Zhang, Ruoran Liu, Suyuan Huang, Zhaoxin Zhang, Zhengkai Guo, Shuojin Yang, Meng-Hao Guo, Huan Yu, Jie Jiang, Shi-Min Hu

TL;DR

GPR reframes advertising recommendation as an end-to-end generative task to address misalignment and error propagation in cascaded pipelines. It introduces unified input representations, a Heterogeneous Hierarchical Decoder (HHD) with HSD, PTD, and HTE, and a three-stage training pipeline (MTP, VAFT, HEPO) augmented by ARR. The approach is validated with large-scale industrial data and online A/B tests, showing significant gains in GMV and CTCVR, and demonstrates practical viability in a real-world, ultra-high-scale advertising system. By aligning user intent understanding with value-driven generation and policy optimization, GPR advances end-to-end, multi-stakeholder advertising systems toward globally optimal performance. The work provides a blueprint for deploying end-to-end generative recommender systems at scale, combining unified representation, hierarchical generation, and reinforcement-learning-based optimization to improve both user experience and monetization.

Abstract

As an intelligent infrastructure connecting users with commercial content, advertising recommendation systems play a central role in information flow and value creation within the digital economy. However, existing multi-stage advertising recommendation systems suffer from objective misalignment and error propagation, making it difficult to achieve global optimality, while unified generative recommendation models still struggle to meet the demands of practical industrial applications. To address these issues, we propose GPR (Generative Pre-trained Recommender), the first one-model framework that redefines advertising recommendation as an end-to-end generative task, replacing the traditional cascading paradigm with a unified generative approach. To realize GPR, we introduce three key innovations spanning unified representation, network architecture, and training strategy. First, we design a unified input schema and tokenization method tailored to advertising scenarios, mapping both ads and organic content into a shared multi-level semantic ID space, thereby enhancing semantic alignment and modeling consistency across heterogeneous data. Second, we develop the Heterogeneous Hierarchical Decoder (HHD), a dual-decoder architecture that decouples user intent modeling from ad generation, achieving a balance between training efficiency and inference flexibility while maintaining strong modeling capacity. Finally, we propose a multi-stage joint training strategy that integrates Multi-Token Prediction (MTP), Value-Aware Fine-Tuning and the Hierarchy Enhanced Policy Optimization (HEPO) algorithm, forming a complete generative recommendation pipeline that unifies interest modeling, value alignment, and policy optimization. GPR has been fully deployed in the Tencent Weixin Channels advertising system, delivering significant improvements in key business metrics including GMV and CTCVR.

GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation

TL;DR

GPR reframes advertising recommendation as an end-to-end generative task to address misalignment and error propagation in cascaded pipelines. It introduces unified input representations, a Heterogeneous Hierarchical Decoder (HHD) with HSD, PTD, and HTE, and a three-stage training pipeline (MTP, VAFT, HEPO) augmented by ARR. The approach is validated with large-scale industrial data and online A/B tests, showing significant gains in GMV and CTCVR, and demonstrates practical viability in a real-world, ultra-high-scale advertising system. By aligning user intent understanding with value-driven generation and policy optimization, GPR advances end-to-end, multi-stakeholder advertising systems toward globally optimal performance. The work provides a blueprint for deploying end-to-end generative recommender systems at scale, combining unified representation, hierarchical generation, and reinforcement-learning-based optimization to improve both user experience and monetization.

Abstract

As an intelligent infrastructure connecting users with commercial content, advertising recommendation systems play a central role in information flow and value creation within the digital economy. However, existing multi-stage advertising recommendation systems suffer from objective misalignment and error propagation, making it difficult to achieve global optimality, while unified generative recommendation models still struggle to meet the demands of practical industrial applications. To address these issues, we propose GPR (Generative Pre-trained Recommender), the first one-model framework that redefines advertising recommendation as an end-to-end generative task, replacing the traditional cascading paradigm with a unified generative approach. To realize GPR, we introduce three key innovations spanning unified representation, network architecture, and training strategy. First, we design a unified input schema and tokenization method tailored to advertising scenarios, mapping both ads and organic content into a shared multi-level semantic ID space, thereby enhancing semantic alignment and modeling consistency across heterogeneous data. Second, we develop the Heterogeneous Hierarchical Decoder (HHD), a dual-decoder architecture that decouples user intent modeling from ad generation, achieving a balance between training efficiency and inference flexibility while maintaining strong modeling capacity. Finally, we propose a multi-stage joint training strategy that integrates Multi-Token Prediction (MTP), Value-Aware Fine-Tuning and the Hierarchy Enhanced Policy Optimization (HEPO) algorithm, forming a complete generative recommendation pipeline that unifies interest modeling, value alignment, and policy optimization. GPR has been fully deployed in the Tencent Weixin Channels advertising system, delivering significant improvements in key business metrics including GMV and CTCVR.

Paper Structure

This paper contains 23 sections, 10 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison between previous methods and ours.
  • Figure 2: Overall Architecture of GPR.
  • Figure 3: Overall Architecture of RQ-Kmeans+.
  • Figure 4: Training Pipeline of GPR.
  • Figure 5: Comparison of loss curves for six different GPR parameter sizes.