Constraint-Aware Generative Auto-bidding via Pareto-Prioritized Regret Optimization
Binglin Wu, Yingyi Zhang, Xianneng Li, Ruyue Deng, Chuan Yue, Weiru Zhang, Xiaoyi Zeng
TL;DR
This work tackles constrained auto-bidding in dynamic advertising by addressing two key limitations of Decision Transformer-based approaches: missing cost-awareness in RTG conditioning and averaging-out behavior due to regression. It introduces PRO-Bid, which combines Constraint-Decoupled Pareto Representation (CDPR) with Counterfactual Regret Optimization (CRO) to enable constraint-aware sequence modeling and active policy improvement toward the Pareto frontier. CDPR decouples the budgeted problem into Return-to-Go $R_t$ and Cost-to-Go $C_t$ streams and emphasizes high-quality trajectories via Pareto-prioritized filtering, while CRO uses a global outcome predictor to identify superior counterfactuals and guides learning through regret-weighted regression toward better regimes. Offline results on AuctionNet and AuctionNet-Sparse, plus online A/B tests on AliExpress, show PRO-Bid achieves superior constraint satisfaction and value, with robust performance under data noise and dynamic CPA targets, indicating strong practical impact for large-scale constrained advertising systems.
Abstract
Auto-bidding systems aim to maximize marketing value while satisfying strict efficiency constraints such as Target Cost-Per-Action (CPA). Although Decision Transformers provide powerful sequence modeling capabilities, applying them to this constrained setting encounters two challenges: 1) standard Return-to-Go conditioning causes state aliasing by neglecting the cost dimension, preventing precise resource pacing; and 2) standard regression forces the policy to mimic average historical behaviors, thereby limiting the capacity to optimize performance toward the constraint boundary. To address these challenges, we propose PRO-Bid, a constraint-aware generative auto-bidding framework based on two synergistic mechanisms: 1) Constraint-Decoupled Pareto Representation (CDPR) decomposes global constraints into recursive cost and value contexts to restore resource perception, while reweighting trajectories based on the Pareto frontier to focus on high-efficiency data; and 2) Counterfactual Regret Optimization (CRO) facilitates active improvement by utilizing a global outcome predictor to identify superior counterfactual actions. By treating these high-utility outcomes as weighted regression targets, the model transcends historical averages to approach the optimal constraint boundary. Extensive experiments on two public benchmarks and online A/B tests demonstrate that PRO-Bid achieves superior constraint satisfaction and value acquisition compared to state-of-the-art baselines.
