Improved Online Learning Algorithms for CTR Prediction in Ad Auctions

Zhe Feng; Christopher Liaw; Zixin Zhou

Improved Online Learning Algorithms for CTR Prediction in Ad Auctions

Zhe Feng, Christopher Liaw, Zixin Zhou

TL;DR

This work develops an online mechanism based on upper-confidence bounds that achieves a tight $O(\sqrt{T})$ regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value and the highest expected value ad.

Abstract

In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner. We focus on two models of the advertisers' strategic behaviors. First, we assume that the advertiser is completely myopic; i.e.~in each round, they aim to maximize their utility only for the current round. In this setting, we develop an online mechanism based on upper-confidence bounds that achieves a tight $O(\sqrt{T})$ regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value (i.e.~value multiplied by their CTR) and second highest expected value ad. Next, we assume that the advertiser is non-myopic and cares about their long term utility. This setting is much more complex since an advertiser is incentivized to influence the mechanism by bidding strategically in earlier rounds. In this setting, we provide an algorithm to achieve negative regret for the static valuation setting (with a positive gap), which is in sharp contrast with the prior work that shows $O(T^{2/3})$ regret when the valuation is generated by adversary.

Improved Online Learning Algorithms for CTR Prediction in Ad Auctions

TL;DR

This work develops an online mechanism based on upper-confidence bounds that achieves a tight

regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value and the highest expected value ad.

Abstract

regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value (i.e.~value multiplied by their CTR) and second highest expected value ad. Next, we assume that the advertiser is non-myopic and cares about their long term utility. This setting is much more complex since an advertiser is incentivized to influence the mechanism by bidding strategically in earlier rounds. In this setting, we provide an algorithm to achieve negative regret for the static valuation setting (with a positive gap), which is in sharp contrast with the prior work that shows

regret when the valuation is generated by adversary.

Paper Structure (27 sections, 20 theorems, 40 equations, 2 algorithms)

This paper contains 27 sections, 20 theorems, 40 equations, 2 algorithms.

Introduction
Our Results
Myopic setting.
Non-myopic setting.
Techniques
Related Work
Model and Preliminaries
Valuation Generation.
Regret.
A UCB-style Mechanism for Myopic Advertisers
Adversarial Valuation
Fixed Valuation
Remark.
Lower Bound Results
Remark.
...and 12 more sections

Key Result

Theorem 1

If the advertisers are myopic then there is an online algorithm that guarantees the following.

Theorems & Definitions (63)

Theorem 1: Informal
Definition 2: Stage-IC
Lemma 3: myerson1981optimal
Definition 4: Pay-per-click Second Price Auctions
Definition 5: Global-IC
Definition 6: Individual Rationality (IR)
Definition 7: Regret
Proposition 7
Theorem 8
Theorem 9
...and 53 more

Improved Online Learning Algorithms for CTR Prediction in Ad Auctions

TL;DR

Abstract

Improved Online Learning Algorithms for CTR Prediction in Ad Auctions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (63)