Table of Contents
Fetching ...

Learning in Repeated Multi-Unit Pay-As-Bid Auctions

Rigel Galgana, Negin Golrezaei

TL;DR

This work shows that a utility decoupling trick enables a polynomial time algorithm to solve the offline problem of learning how to bid in repeated multiunit pay-as-bid (PAB) auctions, and designs efficient algorithms for the online problem under both full information and bandit feedback settings.

Abstract

Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, Procurement Auctions, and Wholesale Electricity Markets, which all involve the auctioning of homogeneous multiple units, we consider the problem of learning how to bid in repeated multi-unit pay-as-bid auctions. In each of these auctions, a large number of (identical) items are to be allocated to the largest submitted bids, where the price of each of the winning bids is equal to the bid itself. In this work, we study the problem of optimizing bidding strategies from the perspective of a single bidder. Effective bidding in pay-as-bid (PAB) auctions is complex due to the combinatorial nature of the action space. We show that a utility decoupling trick enables a polynomial time algorithm to solve the offline problem where competing bids are known in advance. Leveraging this structure, we design efficient algorithms for the online problem under both full information and bandit feedback settings that achieve an upper bound on regret of $O(M \sqrt{T \log T})$ and $O(M T^{\frac{2}{3}} \sqrt{\log T})$ respectively, where $M$ is the number of units demanded by the bidder and $T$ is the total number of auctions. We accompany these results with a regret lower bound of $Ω(M\sqrt{T})$ for the full information setting and $Ω(M^{2/3}T^{2/3})$ for the bandit setting. We also present additional findings on the characterization of PAB equilibria. While the Nash equilibria of PAB auctions possess nice properties such as winning bid uniformity and high welfare \& revenue, they are not guaranteed under no regret learning dynamics. Nevertheless, our simulations suggest these properties hold anyways, regardless of Nash equilibrium existence. Compared to its uniform price counterpart, the PAB dynamics converge faster and achieve higher revenue, making PAB appealing whenever revenue holds significant social value.

Learning in Repeated Multi-Unit Pay-As-Bid Auctions

TL;DR

This work shows that a utility decoupling trick enables a polynomial time algorithm to solve the offline problem of learning how to bid in repeated multiunit pay-as-bid (PAB) auctions, and designs efficient algorithms for the online problem under both full information and bandit feedback settings.

Abstract

Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, Procurement Auctions, and Wholesale Electricity Markets, which all involve the auctioning of homogeneous multiple units, we consider the problem of learning how to bid in repeated multi-unit pay-as-bid auctions. In each of these auctions, a large number of (identical) items are to be allocated to the largest submitted bids, where the price of each of the winning bids is equal to the bid itself. In this work, we study the problem of optimizing bidding strategies from the perspective of a single bidder. Effective bidding in pay-as-bid (PAB) auctions is complex due to the combinatorial nature of the action space. We show that a utility decoupling trick enables a polynomial time algorithm to solve the offline problem where competing bids are known in advance. Leveraging this structure, we design efficient algorithms for the online problem under both full information and bandit feedback settings that achieve an upper bound on regret of and respectively, where is the number of units demanded by the bidder and is the total number of auctions. We accompany these results with a regret lower bound of for the full information setting and for the bandit setting. We also present additional findings on the characterization of PAB equilibria. While the Nash equilibria of PAB auctions possess nice properties such as winning bid uniformity and high welfare \& revenue, they are not guaranteed under no regret learning dynamics. Nevertheless, our simulations suggest these properties hold anyways, regardless of Nash equilibrium existence. Compared to its uniform price counterpart, the PAB dynamics converge faster and achieve higher revenue, making PAB appealing whenever revenue holds significant social value.
Paper Structure (49 sections, 19 theorems, 100 equations, 15 figures, 5 tables, 5 algorithms)

This paper contains 49 sections, 19 theorems, 100 equations, 15 figures, 5 tables, 5 algorithms.

Key Result

Lemma 1

Any PNE (if one exists) requires that the highest bid $b_{(1)}$ is at most $\delta$ above the $M$'th largest bid, denoted by $b_{(M)}$. Here, $\delta$ is the discretization factor.

Figures (15)

  • Figure 1: DP graph for Problem (\ref{['eq:offline']}): Bid optimization problem cast as a graph problem, for $m = 2$ and $|\mathcal{B}| = 3$, $B_1< B_2< B_3$. Node $(m, b)$, the node in the $m$'th layer with bid value $b$, has weight $W_m^{T+1}(b)$.
  • Figure 2: A schematic illustration of how we partition the set of bids in $\mathcal{B}\subset [0,1]$ to construct alternative hypotheses. Under one of these hypotheses denoted by $(j_1,\ldots,j_M)$, we set the marginal distribution of each $b_{-m}$ to be such that all bids $b \in \mathcal{B}_m$ yield $c_m$ expected utility, except for $b_m^{j_m}$---the $j_m$'th largest bid in $\mathcal{B}_m$---which yields $c_m + \gamma_m$ expected utility where $\gamma_m>0$.
  • Figure 3: We compare the time-averaged aggregate discretized regret across all agents for the PAB, under both full information and bandit feedback, for both varying $T$ (left) and $M$ (right). When varying $T$ (resp. $M$), we fix $M = 5$ (resp. $T = 25000$) and derive $|\mathcal{B}|$ and $\eta$ according to Theorems \ref{['thm:full']} and \ref{['thm:decoupled exp - bandit feedback']}.
  • Figure 4: Market Dynamics (Non) Last Iterate Convergence. We plot the bid values over the course of the market dynamics induced by our full information decoupled exponential weights algorithm (Algorithm \ref{['alg: Decoupled Exponential Weights']}) with $N=3,M=4,|\mathcal{B}|=10,T=10^5$. In the left figure, we let $\bm{v}_1 = \bm{v}_2 = \bm{v}_3 = [1-\epsilon, 1-\epsilon]$, and thus there exists a PNE via satisfying the $c = c_{-n} = \lfloor 1-\epsilon \rfloor_{\delta=0.1} = 0.9$ condition from Theorem \ref{['thm: PNE existence']}. Moreover, this PNE is characterized by all bidders submitting bids of $0.9$ for the first two units, which is precisely what the market dynamics converge to. In the right figure, we assume that bidder $3$ demands one unit at $1-\epsilon$, instead of two. The value of other bidders remain the same. Here, we do not satisfy the $c = c_{-n}$ condition as $c = c_{-1} = 1-\epsilon \neq 0 = c_{-2} = c_{-3}$. In fact, one can verify that there exists no PNE in this auction and we observe cyclic bidding behavior from the 2nd and 3rd bidders.
  • Figure 5: Bid convergence over time under the stochastic setting in Section \ref{['sec:stochastic']} for PAB with full information (left), PAB with bandit feedback (middle), and the uniform price auction (right). For the PAB auctions, the solid (resp. dashed) line denotes the decoupled exponential weights (resp. OMD). For uniform price, the solid and dashed lines are for full information and bandit feedback respectively. The top, middle, and bottom lines denote the time averaged values of $b_1, b_2$, and $b_3$ respectively.
  • ...and 10 more figures

Theorems & Definitions (26)

  • Definition 1: Pure Nash Equilibria
  • Definition 2: Coarse correlated equilibria and correlated equilibria
  • Lemma 1
  • Lemma 2
  • Theorem 1: Existence of an Approximately Efficient PNE
  • Lemma 3
  • Theorem 2
  • Theorem 3: Decoupled Exponential Weights: Full Information
  • Theorem 4: Decoupled Exponential Weights: Bandit Feedback
  • Theorem 5: Online Mirror Descent: Bandit Feedback
  • ...and 16 more