Table of Contents
Fetching ...

Generative Large-Scale Pre-trained Models for Automated Ad Bidding Optimization

Yu Lei, Jiayang Zhao, Yilei Zhao, Zhaoqi Zhang, Linyou Cai, Qianlong Xie, Xingxing Wang

TL;DR

The paper tackles auto-bidding under hard budgets and advertiser-specific objectives in dynamic online environments by proposing GRAD, a generative framework combining a Causal Transformer–based Value Estimator with a Mixture-of-Experts ActionMoE for constrained exploration. It introduces time-aware reward shaping, return-to-go conditioning, and a multi-objective loss to balance exploration with constraint satisfaction. Large-scale offline and online evaluations, including production deployment on Meituan, demonstrate GRAD's ability to improve GMV and ROI while maintaining CPC constraints across budgets. The work demonstrates the practicality of scalable generative approaches for industrial auto-bidding and provides deployment guidelines for real-world systems.

Abstract

Modern auto-bidding systems are required to balance overall performance with diverse advertiser goals and real-world constraints, reflecting the dynamic and evolving needs of the industry. Recent advances in conditional generative models, such as transformers and diffusers, have enabled direct trajectory generation tailored to advertiser preferences, offering a promising alternative to traditional Markov Decision Process-based methods. However, these generative methods face significant challenges, such as the distribution shift between offline and online environments, limited exploration of the action space, and the necessity to meet constraints like marginal Cost-per-Mille (CPM) and Return on Investment (ROI). To tackle these challenges, we propose GRAD (Generative Reward-driven Ad-bidding with Mixture-of-Experts), a scalable foundation model for auto-bidding that combines an Action-Mixture-of-Experts module for diverse bidding action exploration with the Value Estimator of Causal Transformer for constraint-aware optimization. Extensive offline and online experiments demonstrate that GRAD significantly enhances platform revenue, highlighting its effectiveness in addressing the evolving and diverse requirements of modern advertisers. Furthermore, GRAD has been implemented in multiple marketing scenarios at Meituan, one of the world's largest online food delivery platforms, leading to a 2.18% increase in Gross Merchandise Value (GMV) and 10.68% increase in ROI.

Generative Large-Scale Pre-trained Models for Automated Ad Bidding Optimization

TL;DR

The paper tackles auto-bidding under hard budgets and advertiser-specific objectives in dynamic online environments by proposing GRAD, a generative framework combining a Causal Transformer–based Value Estimator with a Mixture-of-Experts ActionMoE for constrained exploration. It introduces time-aware reward shaping, return-to-go conditioning, and a multi-objective loss to balance exploration with constraint satisfaction. Large-scale offline and online evaluations, including production deployment on Meituan, demonstrate GRAD's ability to improve GMV and ROI while maintaining CPC constraints across budgets. The work demonstrates the practicality of scalable generative approaches for industrial auto-bidding and provides deployment guidelines for real-world systems.

Abstract

Modern auto-bidding systems are required to balance overall performance with diverse advertiser goals and real-world constraints, reflecting the dynamic and evolving needs of the industry. Recent advances in conditional generative models, such as transformers and diffusers, have enabled direct trajectory generation tailored to advertiser preferences, offering a promising alternative to traditional Markov Decision Process-based methods. However, these generative methods face significant challenges, such as the distribution shift between offline and online environments, limited exploration of the action space, and the necessity to meet constraints like marginal Cost-per-Mille (CPM) and Return on Investment (ROI). To tackle these challenges, we propose GRAD (Generative Reward-driven Ad-bidding with Mixture-of-Experts), a scalable foundation model for auto-bidding that combines an Action-Mixture-of-Experts module for diverse bidding action exploration with the Value Estimator of Causal Transformer for constraint-aware optimization. Extensive offline and online experiments demonstrate that GRAD significantly enhances platform revenue, highlighting its effectiveness in addressing the evolving and diverse requirements of modern advertisers. Furthermore, GRAD has been implemented in multiple marketing scenarios at Meituan, one of the world's largest online food delivery platforms, leading to a 2.18% increase in Gross Merchandise Value (GMV) and 10.68% increase in ROI.

Paper Structure

This paper contains 31 sections, 10 theorems, 38 equations, 5 figures, 5 tables, 1 algorithm.

Key Result

lemma 1

Let $f\in[1-\varepsilon,1+\varepsilon]^d$ and $a'\!=\!a\odot f$. Then

Figures (5)

  • Figure 1: Overall architecture: (a) a Causal Transformer module based on the decoder-only architecture of Transformer(see §\ref{['sec:causal_transformer']}); (b) a Value Estimator (see §\ref{['sec:value_estimator']}); (c) an Action MoE module, as described in §\ref{['sec:actionmoe']}.
  • Figure 2: Performance with different numbers of MoE experts. Bar plots show Score and Total Reward (left Y-axis), while the lines indicate Exceed Rate and CPA Ratio (right Y-axis).
  • Figure 3: Overview of the Online Auto-bidding System
  • Figure 4: Training with the penalized objective ($J_{\lambda}$) improves stability and accelerates convergence.
  • Figure 5: Comparison of CPC_CR on a randomly trajectory

Theorems & Definitions (10)

  • lemma 1: Norm Bound for Elementwise Scaling
  • proposition 1: Feasibility Preservation under Trust-Region Scaling
  • proposition 2: Feasibility Preservation with Residual Fusion
  • corollary 1: Projection Guarantees
  • lemma 2: Performance Difference for Penalized CMDP
  • proposition 3: Monotone Improvement Lower Bound
  • proposition 4: Gram Matrix Well-Conditioning via Angular Separation
  • corollary 2: Coverage Radius on Unit Sphere
  • lemma 3: Output Lipschitzness
  • proposition 5: Gradient Norm Bound under Top-1 Routing