Table of Contents
Fetching ...

SEGB: Self-Evolved Generative Bidding with Local Autoregressive Diffusion

Yulong Gao, Wan Jiang, Mingzhe Cao, Xuepu Wang, Zeyu Pan, Haonan Yang, Ye Liu, Xin Yang

TL;DR

Self-Evolved Generative Bidding (SEGB) is proposed, a framework that plans proactively and refines itself entirely offline and uniquely enables robust policy improvement from static data alone.

Abstract

In the realm of online advertising, automated bidding has become a pivotal tool, enabling advertisers to efficiently capture impression opportunities in real-time. Recently, generative auto-bidding has shown significant promise, offering innovative solutions for effective ad optimization. However, existing offline-trained generative policies lack the near-term foresight required for dynamic markets and usually depend on simulators or external experts for post-training improvement. To overcome these critical limitations, we propose Self-Evolved Generative Bidding (SEGB), a framework that plans proactively and refines itself entirely offline. SEGB first synthesizes plausible short-horizon future states to guide each bid, providing the agent with crucial, dynamic foresight. Crucially, it then performs value-guided policy refinement to iteratively discover superior strategies without any external intervention. This self-contained approach uniquely enables robust policy improvement from static data alone. Experiments on the AuctionNet benchmark and a large-scale A/B test validate our approach, demonstrating that SEGB significantly outperforms state-of-the-art baselines. In a large-scale online deployment, it delivered substantial business value, achieving a +10.19% increase in target cost, proving the effectiveness of our advanced planning and evolution paradigm.

SEGB: Self-Evolved Generative Bidding with Local Autoregressive Diffusion

TL;DR

Self-Evolved Generative Bidding (SEGB) is proposed, a framework that plans proactively and refines itself entirely offline and uniquely enables robust policy improvement from static data alone.

Abstract

In the realm of online advertising, automated bidding has become a pivotal tool, enabling advertisers to efficiently capture impression opportunities in real-time. Recently, generative auto-bidding has shown significant promise, offering innovative solutions for effective ad optimization. However, existing offline-trained generative policies lack the near-term foresight required for dynamic markets and usually depend on simulators or external experts for post-training improvement. To overcome these critical limitations, we propose Self-Evolved Generative Bidding (SEGB), a framework that plans proactively and refines itself entirely offline. SEGB first synthesizes plausible short-horizon future states to guide each bid, providing the agent with crucial, dynamic foresight. Crucially, it then performs value-guided policy refinement to iteratively discover superior strategies without any external intervention. This self-contained approach uniquely enables robust policy improvement from static data alone. Experiments on the AuctionNet benchmark and a large-scale A/B test validate our approach, demonstrating that SEGB significantly outperforms state-of-the-art baselines. In a large-scale online deployment, it delivered substantial business value, achieving a +10.19% increase in target cost, proving the effectiveness of our advanced planning and evolution paradigm.
Paper Structure (50 sections, 18 equations, 2 figures, 5 tables, 2 algorithms)

This paper contains 50 sections, 18 equations, 2 figures, 5 tables, 2 algorithms.

Figures (2)

  • Figure 1: Figure 1: Overview of the SEGB Framework. SEGB consists of three stages. (1) Planning: A LAD model generates a high-fidelity future state prediction $(s'_{t+1})$. (2) Action Generation: A Next-State-Aware DT conditions on this prediction to generate an action $a'_t$. (3) Offline Evolution: The policy is then evolved via GRPO, guided by a frozen Critic and Reference Model to update the DT. Note that Stage 3 is only performed during offline training; online inference relies solely on the efficient Stage 1 and Stage 2 pipeline.
  • Figure 2: Further analysis on key hyperparameters.