Table of Contents
Fetching ...

Nash Equilibrium Constrained Auto-bidding With Bi-level Reinforcement Learning

Zhiyu Mou, Miao Xu, Rongquan Bai, Zhuoran Yang, Chuan Yu, Jian Xu, Bo Zheng

TL;DR

This work reframes auto-bidding as Nash Equilibrium Constrained Bidding (NCB), aiming to maximize social welfare under the $\epsilon$-Nash Equilibrium constraint. It introduces a Bi-level Policy Gradient (BPG) framework that uses a primal-dual approach and a penalized single-level reformulation with a unified optimizer, enabling gradients that do not scale with the number of advertisers. The authors provide theoretical guarantees and demonstrate strong empirical performance in simulations and a real-world TaoBao deployment, showing improved social welfare (GMV) and constraint compliance. Overall, the method advances platform-level optimization in multi-agent auto-bidding and offers scalable, provable mechanisms for NE selection in large-scale online advertising ecosystems.

Abstract

Many online advertising platforms provide advertisers with auto-bidding services to enhance their advertising performance. However, most existing auto-bidding algorithms fail to accurately capture the auto-bidding problem formulation that the platform truly faces, let alone solve it. Actually, we argue that the platform should try to help optimize each advertiser's performance to the greatest extent -- which makes $ε$-Nash Equilibrium ($ε$-NE) a necessary solution concept -- while maximizing the social welfare of all the advertisers for the platform's long-term value. Based on this, we introduce the \emph{Nash-Equilibrium Constrained Bidding} (NCB), a new formulation of the auto-bidding problem from the platform's perspective. Specifically, it aims to maximize the social welfare of all advertisers under the $ε$-NE constraint. However, the NCB problem presents significant challenges due to its constrained bi-level structure and the typically large number of advertisers involved. To address these challenges, we propose a \emph{Bi-level Policy Gradient} (BPG) framework with theoretical guarantees. Notably, its computational complexity is independent of the number of advertisers, and the associated gradients are straightforward to compute. Extensive simulated and real-world experiments validate the effectiveness of the BPG framework.

Nash Equilibrium Constrained Auto-bidding With Bi-level Reinforcement Learning

TL;DR

This work reframes auto-bidding as Nash Equilibrium Constrained Bidding (NCB), aiming to maximize social welfare under the -Nash Equilibrium constraint. It introduces a Bi-level Policy Gradient (BPG) framework that uses a primal-dual approach and a penalized single-level reformulation with a unified optimizer, enabling gradients that do not scale with the number of advertisers. The authors provide theoretical guarantees and demonstrate strong empirical performance in simulations and a real-world TaoBao deployment, showing improved social welfare (GMV) and constraint compliance. Overall, the method advances platform-level optimization in multi-agent auto-bidding and offers scalable, provable mechanisms for NE selection in large-scale online advertising ecosystems.

Abstract

Many online advertising platforms provide advertisers with auto-bidding services to enhance their advertising performance. However, most existing auto-bidding algorithms fail to accurately capture the auto-bidding problem formulation that the platform truly faces, let alone solve it. Actually, we argue that the platform should try to help optimize each advertiser's performance to the greatest extent -- which makes -Nash Equilibrium (-NE) a necessary solution concept -- while maximizing the social welfare of all the advertisers for the platform's long-term value. Based on this, we introduce the \emph{Nash-Equilibrium Constrained Bidding} (NCB), a new formulation of the auto-bidding problem from the platform's perspective. Specifically, it aims to maximize the social welfare of all advertisers under the -NE constraint. However, the NCB problem presents significant challenges due to its constrained bi-level structure and the typically large number of advertisers involved. To address these challenges, we propose a \emph{Bi-level Policy Gradient} (BPG) framework with theoretical guarantees. Notably, its computational complexity is independent of the number of advertisers, and the associated gradients are straightforward to compute. Extensive simulated and real-world experiments validate the effectiveness of the BPG framework.

Paper Structure

This paper contains 46 sections, 6 theorems, 82 equations, 4 figures, 5 tables, 1 algorithm.

Key Result

Theorem 5.2

Under assump:single_level, when $\xi\ge Z\sqrt{3/\mu\gamma}$, the global and local solutions to equ:single_level are equivalent to the solutions to the $\gamma$-approximate problem of equ:primal_domain_objective.

Figures (4)

  • Figure 1: A typical auto-bidding process. It usually involves four components, including advertisers who use the auto-bidding service, the agents created and designed by the platform, an auction mechanism and impression opportunities.
  • Figure 2: The pipeline of the BPG framework for the NCB problem and its theoretical guarantees.
  • Figure 3: Weighted POMDP.
  • Figure 4: Social welfare and Max Exploitability in Q3 under different $\epsilon$. Fully Cooperative means all agents are cooperative.

Theorems & Definitions (11)

  • Theorem 5.2: Equivalence Between \ref{['equ:single_level']} and \ref{['equ:primal_domain_objective']}
  • Theorem 5.3: Permutation-Equivariant POMG
  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof
  • Definition 3.1: Arrival Probability
  • Definition 3.2: Discounted State Distribution
  • Definition 3.3: Expectation Under Product Policy
  • Lemma 4.1: Relation on Global Solutions, Proposition 1 in shen2023penalty
  • ...and 1 more