Table of Contents
Fetching ...

User Response in Ad Auctions: An MDP Formulation of Long-Term Revenue Optimization

Yang Cai, Zhe Feng, Christopher Liaw, Aranyak Mehta, Grigoris Velegkas

TL;DR

This work introduces an MDP-based framework for ad auctions that integrates user response into long-term revenue optimization. The authors show that the long-term optimum in each state is a Myerson-style auction with a modified virtual value that accounts for future user-state dynamics, enabling tractable analysis in the single-slot case. They provide learning algorithms that operate with sample access to the MDP and bidders’ value distributions, plus a simple mechanism built on second-price auctions with personalized reserves that achieves a constant-factor approximation to the optimal long-term revenue. The results bridge auction theory and reinforcement learning, offering practical, sample-efficient methods for dynamic ad auctions while highlighting open questions on extending to more complex user models and state dependence. Overall, the paper advances long-horizon revenue maximization by formalizing user response via MDPs and deriving both optimal and approximately optimal mechanisms with provable guarantees.

Abstract

We propose a new Markov Decision Process (MDP) model for ad auctions to capture the user response to the quality of ads, with the objective of maximizing the long-term discounted revenue. By incorporating user response, our model takes into consideration all three parties involved in the auction (advertiser, auctioneer, and user). The state of the user is modeled as a user-specific click-through rate (CTR) with the CTR changing in the next round according to the set of ads shown to the user in the current round. We characterize the optimal mechanism for this MDP as a Myerson's auction with a notion of modified virtual value, which relies on the value distribution of the advertiser, the current user state, and the future impact of showing the ad to the user. Leveraging this characterization, we design a sample-efficient and computationally-efficient algorithm which outputs an approximately optimal policy that requires only sample access to the true MDP and the value distributions of the bidders. Finally, we propose a simple mechanism built upon second price auctions with personalized reserve prices and show it can achieve a constant-factor approximation to the optimal long term discounted revenue.

User Response in Ad Auctions: An MDP Formulation of Long-Term Revenue Optimization

TL;DR

This work introduces an MDP-based framework for ad auctions that integrates user response into long-term revenue optimization. The authors show that the long-term optimum in each state is a Myerson-style auction with a modified virtual value that accounts for future user-state dynamics, enabling tractable analysis in the single-slot case. They provide learning algorithms that operate with sample access to the MDP and bidders’ value distributions, plus a simple mechanism built on second-price auctions with personalized reserves that achieves a constant-factor approximation to the optimal long-term revenue. The results bridge auction theory and reinforcement learning, offering practical, sample-efficient methods for dynamic ad auctions while highlighting open questions on extending to more complex user models and state dependence. Overall, the paper advances long-horizon revenue maximization by formalizing user response via MDPs and deriving both optimal and approximately optimal mechanisms with provable guarantees.

Abstract

We propose a new Markov Decision Process (MDP) model for ad auctions to capture the user response to the quality of ads, with the objective of maximizing the long-term discounted revenue. By incorporating user response, our model takes into consideration all three parties involved in the auction (advertiser, auctioneer, and user). The state of the user is modeled as a user-specific click-through rate (CTR) with the CTR changing in the next round according to the set of ads shown to the user in the current round. We characterize the optimal mechanism for this MDP as a Myerson's auction with a notion of modified virtual value, which relies on the value distribution of the advertiser, the current user state, and the future impact of showing the ad to the user. Leveraging this characterization, we design a sample-efficient and computationally-efficient algorithm which outputs an approximately optimal policy that requires only sample access to the true MDP and the value distributions of the bidders. Finally, we propose a simple mechanism built upon second price auctions with personalized reserve prices and show it can achieve a constant-factor approximation to the optimal long term discounted revenue.
Paper Structure (21 sections, 18 theorems, 59 equations, 5 algorithms)

This paper contains 21 sections, 18 theorems, 59 equations, 5 algorithms.

Key Result

Theorem 3.1

Suppose that the advertiser distributions are regular and that there are $k$ identical slots. In each state $\text{ctr}$, the optimal (IC) mechanism allocates the slots to the advertisers in the set $W \subseteq [n]$ with $0 < |W| \leq k$ that maximizes provided that Eq. eqn:multi_slot_vv is positive (otherwise, the mechanism does not allocate).

Theorems & Definitions (38)

  • Theorem 3.1
  • Remark 3.2
  • Definition 3.3: Modified virtual value
  • Corollary 3.4
  • Lemma 4.1: Performance of Policies in Empirical MDPs
  • Theorem 4.2: From Value Estimation to Policy Estimation (Adapted from singh1994upper)
  • Theorem 4.3: Approximate Bellman Update
  • Theorem 4.4
  • Definition 4.5
  • Lemma 4.6: Adaptation of devanur2016sample
  • ...and 28 more