Table of Contents
Fetching ...

Generative Auto-Bidding with Value-Guided Explorations

Jingtong Gao, Yewen Li, Shuai Mao, Peng Jiang, Nan Jiang, Yejing Wang, Qingpeng Cai, Fei Pan, Peng Jiang, Kun Gai, Bo An, Xiangyu Zhao

TL;DR

GAVE presents an offline generative auto-bidding framework that leverages a Decision Transformer backbone to model bidding as a sequential task. It introduces a score-based Return-To-Go reflecting CPA-constrained objectives, an action exploration mechanism with RTG evaluation to safely probe outside the offline dataset, and a learnable value function to guide exploration and mitigate OOD risks. The method achieves superior offline performance across AuctionNet benchmarks, improves alignment between training objectives and evaluation metrics, and demonstrates positive online impact in Nobid and Costcap campaigns, ultimately winning the NeurIPS 2024 AIGB Track. This work offers a practical, adaptable approach for CPA-aware, data-efficient auto-bidding in dynamic advertising environments.

Abstract

Auto-bidding, with its strong capability to optimize bidding decisions within dynamic and competitive online environments, has become a pivotal strategy for advertising platforms. Existing approaches typically employ rule-based strategies or Reinforcement Learning (RL) techniques. However, rule-based strategies lack the flexibility to adapt to time-varying market conditions, and RL-based methods struggle to capture essential historical dependencies and observations within Markov Decision Process (MDP) frameworks. Furthermore, these approaches often face challenges in ensuring strategy adaptability across diverse advertising objectives. Additionally, as offline training methods are increasingly adopted to facilitate the deployment and maintenance of stable online strategies, the issues of documented behavioral patterns and behavioral collapse resulting from training on fixed offline datasets become increasingly significant. To address these limitations, this paper introduces a novel offline Generative Auto-bidding framework with Value-Guided Explorations (GAVE). GAVE accommodates various advertising objectives through a score-based Return-To-Go (RTG) module. Moreover, GAVE integrates an action exploration mechanism with an RTG-based evaluation method to explore novel actions while ensuring stability-preserving updates. A learnable value function is also designed to guide the direction of action exploration and mitigate Out-of-Distribution (OOD) problems. Experimental results on two offline datasets and real-world deployments demonstrate that GAVE outperforms state-of-the-art baselines in both offline evaluations and online A/B tests. By applying the core methods of this framework, we proudly secured first place in the NeurIPS 2024 competition, 'AIGB Track: Learning Auto-Bidding Agents with Generative Models'.

Generative Auto-Bidding with Value-Guided Explorations

TL;DR

GAVE presents an offline generative auto-bidding framework that leverages a Decision Transformer backbone to model bidding as a sequential task. It introduces a score-based Return-To-Go reflecting CPA-constrained objectives, an action exploration mechanism with RTG evaluation to safely probe outside the offline dataset, and a learnable value function to guide exploration and mitigate OOD risks. The method achieves superior offline performance across AuctionNet benchmarks, improves alignment between training objectives and evaluation metrics, and demonstrates positive online impact in Nobid and Costcap campaigns, ultimately winning the NeurIPS 2024 AIGB Track. This work offers a practical, adaptable approach for CPA-aware, data-efficient auto-bidding in dynamic advertising environments.

Abstract

Auto-bidding, with its strong capability to optimize bidding decisions within dynamic and competitive online environments, has become a pivotal strategy for advertising platforms. Existing approaches typically employ rule-based strategies or Reinforcement Learning (RL) techniques. However, rule-based strategies lack the flexibility to adapt to time-varying market conditions, and RL-based methods struggle to capture essential historical dependencies and observations within Markov Decision Process (MDP) frameworks. Furthermore, these approaches often face challenges in ensuring strategy adaptability across diverse advertising objectives. Additionally, as offline training methods are increasingly adopted to facilitate the deployment and maintenance of stable online strategies, the issues of documented behavioral patterns and behavioral collapse resulting from training on fixed offline datasets become increasingly significant. To address these limitations, this paper introduces a novel offline Generative Auto-bidding framework with Value-Guided Explorations (GAVE). GAVE accommodates various advertising objectives through a score-based Return-To-Go (RTG) module. Moreover, GAVE integrates an action exploration mechanism with an RTG-based evaluation method to explore novel actions while ensuring stability-preserving updates. A learnable value function is also designed to guide the direction of action exploration and mitigate Out-of-Distribution (OOD) problems. Experimental results on two offline datasets and real-world deployments demonstrate that GAVE outperforms state-of-the-art baselines in both offline evaluations and online A/B tests. By applying the core methods of this framework, we proudly secured first place in the NeurIPS 2024 competition, 'AIGB Track: Learning Auto-Bidding Agents with Generative Models'.

Paper Structure

This paper contains 25 sections, 22 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overall structure of GAVE.
  • Figure 2: Parameter analysis of $w$ on AuctionNet.
  • Figure 3: Ablation study with 100% budget.