Applying Opponent Modeling for Automatic Bidding in Online Repeated Auctions
Yudong Hu, Congying Han, Tiande Guo, Hao Xiao
TL;DR
The paper addresses automatic bidding in online repeated auctions by modeling bidders and the seller as learners in a multiagent reinforcement learning framework. It introduces Bid Net as a bidder-strategy representation and the pseudo-gradient (PG) algorithm, enabling bidders to anticipate how mechanism updates affect outcomes and to converge to equilibrium when all bidders adopt PG. The Myerson Net is used for the seller to learn the revenue-maximizing mechanism, and the interplay between Bid Net and Myerson Net is shown to converge to the induced-game equilibrium, with PG achieving higher bidder utility across varied environments. Empirical results demonstrate Bid Net outperforms linear shading and that PG closely approximates Nash equilibrium with robust performance against static and dynamic opponents. This work provides a principled, scalable approach to automatic bidding in strategic, evolving auction settings and suggests avenues for future research in adaptive, incentive-compatible bidding systems.
Abstract
Online auction scenarios, such as bidding searches on advertising platforms, often require bidders to participate repeatedly in auctions for identical or similar items. Most previous studies have only considered the process by which the seller learns the prior-dependent optimal mechanism in a repeated auction. However, in this paper, we define a multiagent reinforcement learning environment in which strategic bidders and the seller learn their strategies simultaneously and design an automatic bidding algorithm that updates the strategy of bidders through online interactions. We propose Bid Net to replace the linear shading function as a representation of the strategic bidders' strategy, which effectively improves the utility of strategy learned by bidders. We apply and revise the opponent modeling methods to design the PG (pseudo-gradient) algorithm, which allows bidders to learn optimal bidding strategies with predictions of the other agents' strategy transition. We prove that when a bidder uses the PG algorithm, it can learn the best response to static opponents. When all bidders adopt the PG algorithm, the system will converge to the equilibrium of the game induced by the auction. In experiments with diverse environmental settings and varying opponent strategies, the PG algorithm maximizes the utility of bidders. We hope that this article will inspire research on automatic bidding strategies for strategic bidders.
