A pragmatic policy learning approach to account for users' fatigue in repeated auctions
Benjamin Heymann, Rémi Chan--Renous-Legoubin, Alexandre Gilotte
TL;DR
The paper tackles the problem that real-time bidding in repeated online auctions often optimizes only immediate payoff, neglecting long-term value reductions due to user fatigue. It introduces the cost of impatience, develops marginal analysis tools with inverse propensity score estimators, and proposes a fatigue-aware policy-learning approach that reallocates spend across user clusters to maximize value at a fixed budget. By combining offline counterfactual estimation with linearized IPS for variance control, it demonstrates offline improvements and confirms online gains (notably about a 0.7% value increase with roughly a 1% cost reduction). The work provides a practical, reinforcement-learning–inspired methodology for scalable, fatigue-aware bidding in RTB, with potential applicability to other sequential decision tasks.
Abstract
Online advertising banners are sold in real-time through auctions.Typically, the more banners a user is shown, the smaller the marginalvalue of the next banner for this user is. This fact can be detected bybasic ML models, that can be used to predict how previously won auctionsdecrease the current opportunity value. However, learning is not enough toproduce a bid that correctly accounts for how winning the current auctionimpacts the future values. Indeed, a policy that uses this prediction tomaximize the expected payoff of the current auction could be dubbedimpatient because such policy does not fully account for the repeatednature of the auctions. Under this perspective, it seems that most biddersin the literature are impatient. Unsurprisingly, impatience induces a cost.We provide two empirical arguments for the importance of this cost ofimpatience. First, an offline counterfactual analysis and, second, a notablebusiness metrics improvement by mitigating the cost of impatience withpolicy learning
