Table of Contents
Fetching ...

Improved Online Learning Algorithms for CTR Prediction in Ad Auctions

Zhe Feng, Christopher Liaw, Zixin Zhou

TL;DR

This work develops an online mechanism based on upper-confidence bounds that achieves a tight $O(\sqrt{T})$ regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value and the highest expected value ad.

Abstract

In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner. We focus on two models of the advertisers' strategic behaviors. First, we assume that the advertiser is completely myopic; i.e.~in each round, they aim to maximize their utility only for the current round. In this setting, we develop an online mechanism based on upper-confidence bounds that achieves a tight $O(\sqrt{T})$ regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value (i.e.~value multiplied by their CTR) and second highest expected value ad. Next, we assume that the advertiser is non-myopic and cares about their long term utility. This setting is much more complex since an advertiser is incentivized to influence the mechanism by bidding strategically in earlier rounds. In this setting, we provide an algorithm to achieve negative regret for the static valuation setting (with a positive gap), which is in sharp contrast with the prior work that shows $O(T^{2/3})$ regret when the valuation is generated by adversary.

Improved Online Learning Algorithms for CTR Prediction in Ad Auctions

TL;DR

This work develops an online mechanism based on upper-confidence bounds that achieves a tight regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value and the highest expected value ad.

Abstract

In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner. We focus on two models of the advertisers' strategic behaviors. First, we assume that the advertiser is completely myopic; i.e.~in each round, they aim to maximize their utility only for the current round. In this setting, we develop an online mechanism based on upper-confidence bounds that achieves a tight regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value (i.e.~value multiplied by their CTR) and second highest expected value ad. Next, we assume that the advertiser is non-myopic and cares about their long term utility. This setting is much more complex since an advertiser is incentivized to influence the mechanism by bidding strategically in earlier rounds. In this setting, we provide an algorithm to achieve negative regret for the static valuation setting (with a positive gap), which is in sharp contrast with the prior work that shows regret when the valuation is generated by adversary.
Paper Structure (27 sections, 20 theorems, 40 equations, 2 algorithms)

This paper contains 27 sections, 20 theorems, 40 equations, 2 algorithms.

Key Result

Theorem 1

If the advertisers are myopic then there is an online algorithm that guarantees the following.

Theorems & Definitions (63)

  • Theorem 1: Informal
  • Definition 2: Stage-IC
  • Lemma 3: myerson1981optimal
  • Definition 4: Pay-per-click Second Price Auctions
  • Definition 5: Global-IC
  • Definition 6: Individual Rationality (IR)
  • Definition 7: Regret
  • Proposition 7
  • Theorem 8
  • Theorem 9
  • ...and 53 more