Optimizing Online Advertising with Multi-Armed Bandits: Mitigating the Cold Start Problem under Auction Dynamics
Anastasiia Soboleva, Andrey Pudovikov, Roman Snetkov, Alina Babenko, Egor Samosvat, Yuriy Dorn
TL;DR
This paper tackles the cold-start problem in online advertising by modeling multi-slot pay-per-click auctions as a PBM-based multi-armed bandit problem and introducing AuctionUCB-PBM, a UCB-like algorithm that accounts for positional visibility. It provides a finite-time upper bound on budget regret and demonstrates both synthetic and real-world validation, showing that the method improves long-term revenue while enabling controlled short-term exploration to avoid performance degradation. A key contribution is the integration of theory with practice, including a safe deployment strategy that blends baseline rankings with bandit exploration using a tail-focused approach and adjustable conservativeness. The work thus offers a practical, theoretically grounded solution for improving cold-start handling in large-scale online advertising platforms with multiple slots.
Abstract
Online advertising platforms often face a common challenge: the cold start problem. Insufficient behavioral data (clicks) makes accurate click-through rate (CTR) forecasting of new ads challenging. CTR for "old" items can also be significantly underestimated due to their early performance influencing their long-term behavior on the platform. The cold start problem has far-reaching implications for businesses, including missed long-term revenue opportunities. To mitigate this issue, we developed a UCB-like algorithm under multi-armed bandit (MAB) setting for positional-based model (PBM), specifically tailored to auction pay-per-click systems. Our proposed algorithm successfully combines theory and practice: we obtain theoretical upper estimates of budget regret, and conduct a series of experiments on synthetic and real-world data that confirm the applicability of the method on the real platform. In addition to increasing the platform's long-term profitability, we also propose a mechanism for maintaining short-term profits through controlled exploration and exploitation of items.
