Table of Contents
Fetching ...

Optimizing Online Advertising with Multi-Armed Bandits: Mitigating the Cold Start Problem under Auction Dynamics

Anastasiia Soboleva, Andrey Pudovikov, Roman Snetkov, Alina Babenko, Egor Samosvat, Yuriy Dorn

TL;DR

This paper tackles the cold-start problem in online advertising by modeling multi-slot pay-per-click auctions as a PBM-based multi-armed bandit problem and introducing AuctionUCB-PBM, a UCB-like algorithm that accounts for positional visibility. It provides a finite-time upper bound on budget regret and demonstrates both synthetic and real-world validation, showing that the method improves long-term revenue while enabling controlled short-term exploration to avoid performance degradation. A key contribution is the integration of theory with practice, including a safe deployment strategy that blends baseline rankings with bandit exploration using a tail-focused approach and adjustable conservativeness. The work thus offers a practical, theoretically grounded solution for improving cold-start handling in large-scale online advertising platforms with multiple slots.

Abstract

Online advertising platforms often face a common challenge: the cold start problem. Insufficient behavioral data (clicks) makes accurate click-through rate (CTR) forecasting of new ads challenging. CTR for "old" items can also be significantly underestimated due to their early performance influencing their long-term behavior on the platform. The cold start problem has far-reaching implications for businesses, including missed long-term revenue opportunities. To mitigate this issue, we developed a UCB-like algorithm under multi-armed bandit (MAB) setting for positional-based model (PBM), specifically tailored to auction pay-per-click systems. Our proposed algorithm successfully combines theory and practice: we obtain theoretical upper estimates of budget regret, and conduct a series of experiments on synthetic and real-world data that confirm the applicability of the method on the real platform. In addition to increasing the platform's long-term profitability, we also propose a mechanism for maintaining short-term profits through controlled exploration and exploitation of items.

Optimizing Online Advertising with Multi-Armed Bandits: Mitigating the Cold Start Problem under Auction Dynamics

TL;DR

This paper tackles the cold-start problem in online advertising by modeling multi-slot pay-per-click auctions as a PBM-based multi-armed bandit problem and introducing AuctionUCB-PBM, a UCB-like algorithm that accounts for positional visibility. It provides a finite-time upper bound on budget regret and demonstrates both synthetic and real-world validation, showing that the method improves long-term revenue while enabling controlled short-term exploration to avoid performance degradation. A key contribution is the integration of theory with practice, including a safe deployment strategy that blends baseline rankings with bandit exploration using a tail-focused approach and adjustable conservativeness. The work thus offers a practical, theoretically grounded solution for improving cold-start handling in large-scale online advertising platforms with multiple slots.

Abstract

Online advertising platforms often face a common challenge: the cold start problem. Insufficient behavioral data (clicks) makes accurate click-through rate (CTR) forecasting of new ads challenging. CTR for "old" items can also be significantly underestimated due to their early performance influencing their long-term behavior on the platform. The cold start problem has far-reaching implications for businesses, including missed long-term revenue opportunities. To mitigate this issue, we developed a UCB-like algorithm under multi-armed bandit (MAB) setting for positional-based model (PBM), specifically tailored to auction pay-per-click systems. Our proposed algorithm successfully combines theory and practice: we obtain theoretical upper estimates of budget regret, and conduct a series of experiments on synthetic and real-world data that confirm the applicability of the method on the real platform. In addition to increasing the platform's long-term profitability, we also propose a mechanism for maintaining short-term profits through controlled exploration and exploitation of items.

Paper Structure

This paper contains 18 sections, 28 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Evaluation of AuctionUCB-PBM on synthetic data, generated as 1(c) combination - fixed $price=1$ and CTR's from real distribution. Upper is average Regret/t of our algorithm and lower is instant regret per round.
  • Figure 2: Evaluation of AuctionUCB-PBM on synthetic data, generated as 1(c) combination - fixed $price=1$ and CTR's from real distribution. Upper is distribution of absolute error in eCPI estimation. Lower graphic describes the distribution of absolute relative error in CTR (eCPI)
  • Figure 3: Evaluating AuctionUCB-PBM on real data. Upper graphic describes the average on time cumulative regret comparing baseline strategy with modified $U_k(t, \delta)$ and solution from proposed algorithm\ref{['alg:AuctionUCB-PBM']}. Lower graphic describes the instant regret and it's smoothed version.
  • Figure 4: Evaluating AuctionUCB-PBM on real data. The upper picture describes the distribution of absolute error for eCPI's and the lower one desribes the dependence of relative absolute error in CTR (or eCPI's) to number of opportunities.
  • Figure 5: Evaluating AuctionUCB-PBM on synthetic data. Comparing $Regret/t$ on with fixed price - fixed $price=1$
  • ...and 2 more figures