Table of Contents
Fetching ...

RELATE: A Reinforcement Learning-Enhanced LLM Framework for Advertising Text Generation

Jinfang Wang, Jiajie Liu, Jianwei Wu, Ziqin Luo, Zhen Chen, Chunlei Li, Biao Han, Tao Deng, Yi Li, Shuanglong Li, Lin Liu

TL;DR

RELATE addresses the misalignment between advertising text generation and online business objectives by unifying generation and objective optimization in an end-to-end RL framework built on large language models. It introduces a multi-dimensional reward system (quality, diversity, and CTCVR) and a granularity-aware credit assignment mechanism, optimized via GRPO to align text generation with conversion goals under compliance constraints. Offline and online experiments on a Baidu advertising dataset show consistent improvements over baselines, including a relative +9.19% CTCVR uplift in production, with strong gains in compliance and diversity. This approach reduces funnel friction, enhances long-term performance, and demonstrates practical viability for large-scale, policy-constrained advertising deployments.

Abstract

In online advertising, advertising text plays a critical role in attracting user engagement and driving advertiser value. Existing industrial systems typically follow a two-stage paradigm, where candidate texts are first generated and subsequently aligned with online performance metrics such as click-through rate(CTR). This separation often leads to misaligned optimization objectives and low funnel efficiency, limiting global optimality. To address these limitations, we propose RELATE, a reinforcement learning-based end-to-end framework that unifies generation and objective alignment within a single model. Instead of decoupling text generation from downstream metric alignment, RELATE integrates performance and compliance objectives directly into the generation process via policy learning. To better capture ultimate advertiser value beyond click-level signals, We incorporate conversion-oriented metrics into the objective and jointly model them with compliance constraints as multi-dimensional rewards, enabling the model to generate high-quality ad texts that improve conversion performance under policy constraints. Extensive experiments on large-scale industrial datasets demonstrate that RELATE consistently outperforms baselines. Furthermore, online deployment on a production advertising platform yields statistically significant improvements in click-through conversion rate(CTCVR) under strict policy constraints, validating the robustness and real-world effectiveness of the proposed framework.

RELATE: A Reinforcement Learning-Enhanced LLM Framework for Advertising Text Generation

TL;DR

RELATE addresses the misalignment between advertising text generation and online business objectives by unifying generation and objective optimization in an end-to-end RL framework built on large language models. It introduces a multi-dimensional reward system (quality, diversity, and CTCVR) and a granularity-aware credit assignment mechanism, optimized via GRPO to align text generation with conversion goals under compliance constraints. Offline and online experiments on a Baidu advertising dataset show consistent improvements over baselines, including a relative +9.19% CTCVR uplift in production, with strong gains in compliance and diversity. This approach reduces funnel friction, enhances long-term performance, and demonstrates practical viability for large-scale, policy-constrained advertising deployments.

Abstract

In online advertising, advertising text plays a critical role in attracting user engagement and driving advertiser value. Existing industrial systems typically follow a two-stage paradigm, where candidate texts are first generated and subsequently aligned with online performance metrics such as click-through rate(CTR). This separation often leads to misaligned optimization objectives and low funnel efficiency, limiting global optimality. To address these limitations, we propose RELATE, a reinforcement learning-based end-to-end framework that unifies generation and objective alignment within a single model. Instead of decoupling text generation from downstream metric alignment, RELATE integrates performance and compliance objectives directly into the generation process via policy learning. To better capture ultimate advertiser value beyond click-level signals, We incorporate conversion-oriented metrics into the objective and jointly model them with compliance constraints as multi-dimensional rewards, enabling the model to generate high-quality ad texts that improve conversion performance under policy constraints. Extensive experiments on large-scale industrial datasets demonstrate that RELATE consistently outperforms baselines. Furthermore, online deployment on a production advertising platform yields statistically significant improvements in click-through conversion rate(CTCVR) under strict policy constraints, validating the robustness and real-world effectiveness of the proposed framework.
Paper Structure (24 sections, 12 equations, 3 figures, 4 tables)

This paper contains 24 sections, 12 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: An ad text illustration of online advertisement
  • Figure 2: Workflow overview of RELATE. Built upon GPRO shao2024grpo, RELATE features a Multi-dimensional Reward System and a Group Computation with Credit Assignment. The former integrates rewards from multiple reward dimensions, including CTCVR Reward, Quality Reward, and Diversity Reward. The latter introduces credit assignment during group computation to derive differentiated, token-level advantages.
  • Figure 3: Training curves of individual rewards under different ablation settings.