Table of Contents
Fetching ...

Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions

Negin Golrezaei, Sourav Sahoo

TL;DR

This work addresses bidding in repeated uniform price auctions for a value-maximizing buyer constrained by per-round RoI. It introduces safe bidding strategies that guarantee RoI irrespective of rivals and identifies a finite, valuation-curve–driven subclass with a nested structure, enabling efficient learning. A DAG-based offline reduction and a weight-pushing online learner achieve sublinear regret under full information and bandit feedback, with lower bounds establishing near-optimality; the analysis extends to richer benchmark classes via a richness ratio $\alpha$. Empirical simulations on semi-synthetic EU ETS data show practical richness ratios far exceed worst-case bounds, supporting the approach's robustness and potential applicability to extended models and extension scenarios such as adaptive adversaries and time-varying valuations.

Abstract

We study the bidding problem in repeated uniform price multi-unit auctions from the perspective of a value-maximizing buyer. The buyer aims to maximize their cumulative value over $T$ rounds while adhering to per-round return-on-investment (RoI) constraints in a strategic (or adversarial) environment. Using an $m$-uniform bidding format, the buyer submits $m$ bid-quantity pairs $(b_i, q_i)$ to demand $q_i$ units at bid $b_i$, with $m \ll M$ in practice, where $M$ denotes the maximum demand of the buyer. We introduce the notion of safe bidding strategies as those that satisfy the RoI constraints irrespective of competing bids. Despite the stringent requirement, we show that these strategies satisfy a mild no-overbidding condition, depend only on the valuation curve of the bidder, and the bidder can focus on a finite subset without loss of generality. Though the subset size is $O(M^m)$, we design a polynomial-time learning algorithm that achieves sublinear regret, both in full-information and bandit settings, relative to the hindsight-optimal safe strategy. We assess the robustness of safe strategies against the hindsight-optimal strategy from a richer class. We define the richness ratio $α\in (0,1]$ as the minimum ratio of the value of the optimal safe strategy to that of the optimal strategy from richer class and construct hard instances showing the tightness of $α$. Our algorithm achieves $α$-approximate sublinear regret against these stronger benchmarks. Simulations on semi-synthetic auction data show that empirical richness ratios significantly outperform the theoretical worst-case bounds. The proposed safe strategies and learning algorithm extend naturally to more nuanced buyer and competitor models.

Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions

TL;DR

This work addresses bidding in repeated uniform price auctions for a value-maximizing buyer constrained by per-round RoI. It introduces safe bidding strategies that guarantee RoI irrespective of rivals and identifies a finite, valuation-curve–driven subclass with a nested structure, enabling efficient learning. A DAG-based offline reduction and a weight-pushing online learner achieve sublinear regret under full information and bandit feedback, with lower bounds establishing near-optimality; the analysis extends to richer benchmark classes via a richness ratio . Empirical simulations on semi-synthetic EU ETS data show practical richness ratios far exceed worst-case bounds, supporting the approach's robustness and potential applicability to extended models and extension scenarios such as adaptive adversaries and time-varying valuations.

Abstract

We study the bidding problem in repeated uniform price multi-unit auctions from the perspective of a value-maximizing buyer. The buyer aims to maximize their cumulative value over rounds while adhering to per-round return-on-investment (RoI) constraints in a strategic (or adversarial) environment. Using an -uniform bidding format, the buyer submits bid-quantity pairs to demand units at bid , with in practice, where denotes the maximum demand of the buyer. We introduce the notion of safe bidding strategies as those that satisfy the RoI constraints irrespective of competing bids. Despite the stringent requirement, we show that these strategies satisfy a mild no-overbidding condition, depend only on the valuation curve of the bidder, and the bidder can focus on a finite subset without loss of generality. Though the subset size is , we design a polynomial-time learning algorithm that achieves sublinear regret, both in full-information and bandit settings, relative to the hindsight-optimal safe strategy. We assess the robustness of safe strategies against the hindsight-optimal strategy from a richer class. We define the richness ratio as the minimum ratio of the value of the optimal safe strategy to that of the optimal strategy from richer class and construct hard instances showing the tightness of . Our algorithm achieves -approximate sublinear regret against these stronger benchmarks. Simulations on semi-synthetic auction data show that empirical richness ratios significantly outperform the theoretical worst-case bounds. The proposed safe strategies and learning algorithm extend naturally to more nuanced buyer and competitor models.
Paper Structure (64 sections, 35 theorems, 201 equations, 5 figures, 4 tables, 5 algorithms)

This paper contains 64 sections, 35 theorems, 201 equations, 5 figures, 4 tables, 5 algorithms.

Key Result

Theorem 3.1

For any $m\in\mathbb{N}$, no overbidding is allowed in $\mathscr{S}_{m}$. So, the collection of all $m$-uniform safe strategies is where $\bf w$ is defined in Eq. eq:w.

Figures (5)

  • Figure 1: Nested structure of the bidding strategies in $\mathscr{S}_{3}^\star$. Consider the strategy $\mathbf{b}=\langle(w_3, 3), (w_7, 4), (w_9, 2)\rangle\in\mathscr{S}_{3}^\star$, where we note that $Q_1= 3$, $Q_2=3+4=7$ and $Q_3 = 3+4+2=9$. The $j^{th}$ highest bid (i.e., $b_j$) is the average of the first $Q_j=\sum_{\ell\leq j}q_\ell$ entries of the valuation vector, i.e., $b_j = w_{Q_j}$, where $w_{j}=\frac{1}{j}\sum_{\ell\leq j}v_{\ell}$.
  • Figure 2: The solid line represents the average cumulative valuation curve, $\mathbf{w}$, and the dotted line represents the valuation curve, $\mathbf{v}$. The figure in the left (resp. right) illustrates underbidding (resp. overbidding) for a $2$-uniform bidding strategy. Note that the notions of underbidding and overbidding in \ref{['def:under-over_bid']} are defined with respect to $\mathbf{w}$ and not$\mathbf{v}$. Here, the plots of $\mathbf{v}$ and $\mathbf{w}$ are shown to be linear for illustrative purposes only.
  • Figure 3: In this DAG, $M=3$, $m=2$. The red path refers to $\mathbf{b}=\langle (w_1, 1), (w_3, 2)\rangle\in\mathscr{U}_{2}^\star.$ The values in red are the corresponding $Q_j$'s. Similarly, the blue path refers to $\mathbf{b}=(w_3, 3)\in\mathscr{U}_{2}^\star$.
  • Figure 4: The left (resp. right) figure refers to (the lower bound on) $\alpha_{\mathscr{F}^{\mathcal{H}_{-}}_{m},\mathscr{U}_{m}^\star}(\mathcal{H}_{-}, \mathbf{v})$ (resp. $\alpha_{\mathscr{U}_{m}^\star, \mathscr{U}_{1}^\star}(\mathcal{H}_{-}, \mathbf{v})$) as a function of $m$. The shaded area corresponds to one standard deviation.
  • Figure 5: The left figure shows the relative gain obtained as a function of $m$. Here, relative gain is the ratio of the value obtained by the bidder over $T$ rounds when RoI constraints are enforced over $T_0$ rounds to that when RoI constraints are enforced in each round, i.e., $T_0=1$. The box plot in the right figure shows the distribution of per-round relative feasibility $\frac{V(\mathbf{b}^t; \boldsymbol{\beta}_{-}^{t})}{P(\mathbf{b}^t; \boldsymbol{\beta}_{-}^{t})} - 1.$

Theorems & Definitions (62)

  • Definition 1: $m$-Uniform Bidding
  • Example 1
  • Remark 2.1: RoI Constraints
  • Definition 2: Safe Strategies
  • Definition 3: Underbid and Overbid
  • Theorem 3.1
  • Theorem 3.2
  • Lemma 4.1
  • proof
  • Theorem 4.2
  • ...and 52 more