Table of Contents
Fetching ...

Constraint-Aware Generative Auto-bidding via Pareto-Prioritized Regret Optimization

Binglin Wu, Yingyi Zhang, Xianneng Li, Ruyue Deng, Chuan Yue, Weiru Zhang, Xiaoyi Zeng

TL;DR

This work tackles constrained auto-bidding in dynamic advertising by addressing two key limitations of Decision Transformer-based approaches: missing cost-awareness in RTG conditioning and averaging-out behavior due to regression. It introduces PRO-Bid, which combines Constraint-Decoupled Pareto Representation (CDPR) with Counterfactual Regret Optimization (CRO) to enable constraint-aware sequence modeling and active policy improvement toward the Pareto frontier. CDPR decouples the budgeted problem into Return-to-Go $R_t$ and Cost-to-Go $C_t$ streams and emphasizes high-quality trajectories via Pareto-prioritized filtering, while CRO uses a global outcome predictor to identify superior counterfactuals and guides learning through regret-weighted regression toward better regimes. Offline results on AuctionNet and AuctionNet-Sparse, plus online A/B tests on AliExpress, show PRO-Bid achieves superior constraint satisfaction and value, with robust performance under data noise and dynamic CPA targets, indicating strong practical impact for large-scale constrained advertising systems.

Abstract

Auto-bidding systems aim to maximize marketing value while satisfying strict efficiency constraints such as Target Cost-Per-Action (CPA). Although Decision Transformers provide powerful sequence modeling capabilities, applying them to this constrained setting encounters two challenges: 1) standard Return-to-Go conditioning causes state aliasing by neglecting the cost dimension, preventing precise resource pacing; and 2) standard regression forces the policy to mimic average historical behaviors, thereby limiting the capacity to optimize performance toward the constraint boundary. To address these challenges, we propose PRO-Bid, a constraint-aware generative auto-bidding framework based on two synergistic mechanisms: 1) Constraint-Decoupled Pareto Representation (CDPR) decomposes global constraints into recursive cost and value contexts to restore resource perception, while reweighting trajectories based on the Pareto frontier to focus on high-efficiency data; and 2) Counterfactual Regret Optimization (CRO) facilitates active improvement by utilizing a global outcome predictor to identify superior counterfactual actions. By treating these high-utility outcomes as weighted regression targets, the model transcends historical averages to approach the optimal constraint boundary. Extensive experiments on two public benchmarks and online A/B tests demonstrate that PRO-Bid achieves superior constraint satisfaction and value acquisition compared to state-of-the-art baselines.

Constraint-Aware Generative Auto-bidding via Pareto-Prioritized Regret Optimization

TL;DR

This work tackles constrained auto-bidding in dynamic advertising by addressing two key limitations of Decision Transformer-based approaches: missing cost-awareness in RTG conditioning and averaging-out behavior due to regression. It introduces PRO-Bid, which combines Constraint-Decoupled Pareto Representation (CDPR) with Counterfactual Regret Optimization (CRO) to enable constraint-aware sequence modeling and active policy improvement toward the Pareto frontier. CDPR decouples the budgeted problem into Return-to-Go and Cost-to-Go streams and emphasizes high-quality trajectories via Pareto-prioritized filtering, while CRO uses a global outcome predictor to identify superior counterfactuals and guides learning through regret-weighted regression toward better regimes. Offline results on AuctionNet and AuctionNet-Sparse, plus online A/B tests on AliExpress, show PRO-Bid achieves superior constraint satisfaction and value, with robust performance under data noise and dynamic CPA targets, indicating strong practical impact for large-scale constrained advertising systems.

Abstract

Auto-bidding systems aim to maximize marketing value while satisfying strict efficiency constraints such as Target Cost-Per-Action (CPA). Although Decision Transformers provide powerful sequence modeling capabilities, applying them to this constrained setting encounters two challenges: 1) standard Return-to-Go conditioning causes state aliasing by neglecting the cost dimension, preventing precise resource pacing; and 2) standard regression forces the policy to mimic average historical behaviors, thereby limiting the capacity to optimize performance toward the constraint boundary. To address these challenges, we propose PRO-Bid, a constraint-aware generative auto-bidding framework based on two synergistic mechanisms: 1) Constraint-Decoupled Pareto Representation (CDPR) decomposes global constraints into recursive cost and value contexts to restore resource perception, while reweighting trajectories based on the Pareto frontier to focus on high-efficiency data; and 2) Counterfactual Regret Optimization (CRO) facilitates active improvement by utilizing a global outcome predictor to identify superior counterfactual actions. By treating these high-utility outcomes as weighted regression targets, the model transcends historical averages to approach the optimal constraint boundary. Extensive experiments on two public benchmarks and online A/B tests demonstrate that PRO-Bid achieves superior constraint satisfaction and value acquisition compared to state-of-the-art baselines.
Paper Structure (34 sections, 23 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 34 sections, 23 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overall framework of PRO-Bid.
  • Figure 2: Comparison in different CPA constraint settings.
  • Figure 3: Performance under different noise augmentation.
  • Figure 4: Visualization of inference results on AuctionNet.
  • Figure 5: Online Auto-bidding System.
  • ...and 3 more figures