Bandits for Sponsored Search Auctions under Unknown Valuation Model: Case Study in E-Commerce Advertising
Danil Provodin, Jérémie Joudioux, Eduard Duryev
TL;DR
The paper tackles bidding in sponsored search auctions under an unknown valuation model and black-box auction dynamics by casting the problem as adversarial bandits with batched and delayed feedback. It introduces BatchEXP3, a batched-delayed extension of EXP3, and demonstrates its deployment in a real-world Zalando setting with a two-stream bidding system and click-attribution integration. Empirical live-test results show profitability improvements driven primarily by cost reductions during exploration, while revenue remains relatively stable, highlighting practical benefits and the importance of handling sparse, delayed signals. The authors also provide practical insights and discuss challenges such as reward scaling, sparse feedback, and hyperparameter tuning, outlining directions for future work to better align theory with production realities.
Abstract
This paper presents a bidding system for sponsored search auctions under an unknown valuation model. This formulation assumes that the bidder's value is unknown, evolving arbitrarily, and observed only upon winning an auction. Unlike previous studies, we do not impose any assumptions on the nature of feedback and consider the problem of bidding in sponsored search auctions in its full generality. Our system is based on a bandit framework that is resilient to the black-box auction structure and delayed and batched feedback. To validate our proposed solution, we conducted a case study at Zalando, a leading fashion e-commerce company. We outline the development process and describe the promising outcomes of our bandits-based approach to increase profitability in sponsored search auctions. We discuss in detail the technical challenges that were overcome during the implementation, shedding light on the mechanisms that led to increased profitability.
