Table of Contents
Fetching ...

Applying Opponent Modeling for Automatic Bidding in Online Repeated Auctions

Yudong Hu, Congying Han, Tiande Guo, Hao Xiao

TL;DR

The paper addresses automatic bidding in online repeated auctions by modeling bidders and the seller as learners in a multiagent reinforcement learning framework. It introduces Bid Net as a bidder-strategy representation and the pseudo-gradient (PG) algorithm, enabling bidders to anticipate how mechanism updates affect outcomes and to converge to equilibrium when all bidders adopt PG. The Myerson Net is used for the seller to learn the revenue-maximizing mechanism, and the interplay between Bid Net and Myerson Net is shown to converge to the induced-game equilibrium, with PG achieving higher bidder utility across varied environments. Empirical results demonstrate Bid Net outperforms linear shading and that PG closely approximates Nash equilibrium with robust performance against static and dynamic opponents. This work provides a principled, scalable approach to automatic bidding in strategic, evolving auction settings and suggests avenues for future research in adaptive, incentive-compatible bidding systems.

Abstract

Online auction scenarios, such as bidding searches on advertising platforms, often require bidders to participate repeatedly in auctions for identical or similar items. Most previous studies have only considered the process by which the seller learns the prior-dependent optimal mechanism in a repeated auction. However, in this paper, we define a multiagent reinforcement learning environment in which strategic bidders and the seller learn their strategies simultaneously and design an automatic bidding algorithm that updates the strategy of bidders through online interactions. We propose Bid Net to replace the linear shading function as a representation of the strategic bidders' strategy, which effectively improves the utility of strategy learned by bidders. We apply and revise the opponent modeling methods to design the PG (pseudo-gradient) algorithm, which allows bidders to learn optimal bidding strategies with predictions of the other agents' strategy transition. We prove that when a bidder uses the PG algorithm, it can learn the best response to static opponents. When all bidders adopt the PG algorithm, the system will converge to the equilibrium of the game induced by the auction. In experiments with diverse environmental settings and varying opponent strategies, the PG algorithm maximizes the utility of bidders. We hope that this article will inspire research on automatic bidding strategies for strategic bidders.

Applying Opponent Modeling for Automatic Bidding in Online Repeated Auctions

TL;DR

The paper addresses automatic bidding in online repeated auctions by modeling bidders and the seller as learners in a multiagent reinforcement learning framework. It introduces Bid Net as a bidder-strategy representation and the pseudo-gradient (PG) algorithm, enabling bidders to anticipate how mechanism updates affect outcomes and to converge to equilibrium when all bidders adopt PG. The Myerson Net is used for the seller to learn the revenue-maximizing mechanism, and the interplay between Bid Net and Myerson Net is shown to converge to the induced-game equilibrium, with PG achieving higher bidder utility across varied environments. Empirical results demonstrate Bid Net outperforms linear shading and that PG closely approximates Nash equilibrium with robust performance against static and dynamic opponents. This work provides a principled, scalable approach to automatic bidding in strategic, evolving auction settings and suggests avenues for future research in adaptive, incentive-compatible bidding systems.

Abstract

Online auction scenarios, such as bidding searches on advertising platforms, often require bidders to participate repeatedly in auctions for identical or similar items. Most previous studies have only considered the process by which the seller learns the prior-dependent optimal mechanism in a repeated auction. However, in this paper, we define a multiagent reinforcement learning environment in which strategic bidders and the seller learn their strategies simultaneously and design an automatic bidding algorithm that updates the strategy of bidders through online interactions. We propose Bid Net to replace the linear shading function as a representation of the strategic bidders' strategy, which effectively improves the utility of strategy learned by bidders. We apply and revise the opponent modeling methods to design the PG (pseudo-gradient) algorithm, which allows bidders to learn optimal bidding strategies with predictions of the other agents' strategy transition. We prove that when a bidder uses the PG algorithm, it can learn the best response to static opponents. When all bidders adopt the PG algorithm, the system will converge to the equilibrium of the game induced by the auction. In experiments with diverse environmental settings and varying opponent strategies, the PG algorithm maximizes the utility of bidders. We hope that this article will inspire research on automatic bidding strategies for strategic bidders.
Paper Structure (14 sections, 4 theorems, 25 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 14 sections, 4 theorems, 25 equations, 6 figures, 1 table, 2 algorithms.

Key Result

theorem 1

Myerson Net satisfies the IC constraints in single-item auctions. For a Myerson Net $M$, we add a strictly increasing function $Q$ to each virtual value function $G'_i=Q \circ G_i$ that satisfies $G_i(b_i) = 0 \iff Q \circ G_i(b_i) = 0$ to obtain another mechanism $M'$. Then $M'$ and $M$ have the sa

Figures (6)

  • Figure 1: The learned virtual value function after 200 iterations of Myerson-Net in single-item, two-bidder auction. The line labeled truth indicates that it is derived from Myerson's Lemma, and the line labeled Myerson-Net is from the output of the network. The revenue curve represents the revenue of the mechanism in the learning process.
  • Figure 2: The network for strategic bidder (Bid Net), which takes the value of the bidder as input. NeuralSort is a differentiable sorting operator that can output an approximate sorted sequence while preserving the gradient.
  • Figure 3: Gradient propagation direction of the repeated auction induced MARL system. The red line represents a direct gradient, which comes from the revenue and utility. The blue line represents the indirect gradient, which comes from the impact of the bidder's strategy on other players.
  • Figure 4: The utility of a strategic bidder in scenarios where another bidder consistently employs the truthful bidding strategy, while the seller's strategy is derived from the Myerson Net. The solid red line represents the utility of the strategic bidder, while the solid yellow line represents the revenue of the seller. The dashed line labeled "Truthful Myerson" represents the theoretical utility and revenue when the strategic bidder adheres to the truthful bidding strategy. The dashed line labeled "Theresholded Myerson" illustrates the theoretical utility and revenue when the strategic bidder employs the optimal bidding strategy.
  • Figure 5: The strategy parameters $\alpha_i$ of strategic bidders during the learning process. Both bidders are designated strategic bidders and employ the same algorithm. The seller strategy is derived from Myerson Net. The line labeled "truth" represents the truthful strategy $\alpha_i = 1$, and the line labeled "equilibrium" represents the equilibrium strategy of the induced game $\alpha_i = \frac{5}{14}$.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Definition 1: Myerson Net
  • theorem 1
  • Definition 2: Induced game of Myerson mechanism $M$
  • theorem 2
  • theorem 3
  • theorem 4