Table of Contents
Fetching ...

Joint Value Estimation and Bidding in Repeated First-Price Auctions

Yuxiao Wen, Yanjun Han, Zhengyuan Zhou

Abstract

We study regret minimization in repeated first-price auctions (FPAs), where a bidder observes only the realized outcome after each auction -- win or loss. This setup reflects practical scenarios in online display advertising where the actual value of an impression depends on the difference between two potential outcomes, such as clicks or conversion rates, when the auction is won versus lost. We incorporate causal inference into this framework and analyze the challenging case where only the treatment effect admits a simple dependence on observable features. Under this framework, we propose algorithms that jointly estimate private values and optimize bidding strategies under two different feedback types on the highest other bid (HOB): the full-information feedback where the HOB is always revealed, and the binary feedback where the bidder only observes the win-loss indicator. Under both cases, our algorithms are shown to achieve near-optimal regret bounds. Notably, our framework enjoys a unique feature that the treatments are actively chosen, and hence eliminates the need for the overlap condition commonly required in causal inference.

Joint Value Estimation and Bidding in Repeated First-Price Auctions

Abstract

We study regret minimization in repeated first-price auctions (FPAs), where a bidder observes only the realized outcome after each auction -- win or loss. This setup reflects practical scenarios in online display advertising where the actual value of an impression depends on the difference between two potential outcomes, such as clicks or conversion rates, when the auction is won versus lost. We incorporate causal inference into this framework and analyze the challenging case where only the treatment effect admits a simple dependence on observable features. Under this framework, we propose algorithms that jointly estimate private values and optimize bidding strategies under two different feedback types on the highest other bid (HOB): the full-information feedback where the HOB is always revealed, and the binary feedback where the bidder only observes the win-loss indicator. Under both cases, our algorithms are shown to achieve near-optimal regret bounds. Notably, our framework enjoys a unique feature that the treatments are actively chosen, and hence eliminates the need for the overlap condition commonly required in causal inference.

Paper Structure

This paper contains 60 sections, 54 theorems, 205 equations, 6 figures, 1 table, 9 algorithms.

Key Result

Theorem 2.5

Suppose the bidder implements Abstraction ass:est_oracle_bern. Under Assumptions assump:linear--ass:Gt, there is an algorithm $\pi_{\mathrm{lte}}$ (with the knowledge of the confidence bound $\delta_t$ at time $t$ and the knowledge of $L$) that achieves the expected regret where $\Delta := 1 + \sum_{t=1}^T \delta_t^2$.

Figures (6)

  • Figure 1: The learner repeatedly bids for advertisement slots and observes the (random) outcome sales after winning or losing the auctions.
  • Figure 2: The better choice of two UCBs. We restrict our bid selection to the interval $[b_{\mathrm{left}},\,b_{\mathrm{right}}]$ returned by \ref{['alg:ucb_selection']}, which contains the hindsight optimal bid $b_t^*$ marked by $\times$. Then we select the UCB with the tighter width $w_{t,i}$ over this bid interval by recognizing which endpoint ($0$ or $1$) the CDF interval $[\widehat{G}_t(b_{\mathrm{left}}),\,\widehat{G}_t(b_{\mathrm{right}})]$ is closer to. In this example, the CDF interval is closer to $1$, so the UCB index $i=1$ will be returned from \ref{['alg:ucb_selection']}, and we will use UCB $u_{t,1}$ (from width $w_{t,1}$) for further bid selection and elimination in \ref{['alg:master_alg_te']}.
  • Figure 3: Enforcing monotonicity by width. The procedure begins with the first (leftmost) interval and then iterates through the intervals $j=1,\dots,J$. For each interval, it sets the estimated value to be the maximum of the lower confidence bound and the value of the previous interval. Its validity is guaranteed when the true CDF $G$ indeed lies within the shaded regions.
  • Figure 4: Plot (a) presents the periodic pattern of the means of the baseline and the winning outcomes, and the latter's variance, respectively. Plot (b) presents the trajectories of the bid difference $b_t - b_t^*$ of the four algorithms compared to the optimal bid, smoothed over a sliding window of width $500$ for visualization.
  • Figure 5: Plot (a) presents the pseudo regret of each of the four algorithms over a horizon $T=30,000$, averaged over five independent runs. The shaded region stands for one standard error. Plot (b) presents the corresponding cumulative reward and that achieved by the optimal oracle $(b_t^*)_t$, averaged over five runs. It zooms in around the initial time $t=0$ and marks when the reward becomes positive (i.e. breaks even).
  • ...and 1 more figures

Theorems & Definitions (82)

  • Theorem 2.5: Upper bound I
  • Lemma 2.7: Linear HOB Estimation
  • Theorem 2.8: Lower bound
  • Theorem 2.9: Upper bound II
  • Theorem 2.10: Upper bound III
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3: Good CDF interval
  • Lemma 3.4: Small width for selected UCB
  • Lemma 3.5: Lemma 14 of auer2002using
  • ...and 72 more