Online Learning under Budget and ROI Constraints via Weak Adaptivity
Matteo Castiglioni, Andrea Celli, Christian Kroer
TL;DR
The paper tackles online learning with long-horizon budget and ROI constraints under both stochastic and adversarial inputs, addressing the impracticality of knowing the Slater parameter and requiring a feasible action every round. It introduces a dual-balancing framework that embeds weakly adaptive regret minimizers into the primal-dual template, enabling bounded dual multipliers and sublinear ROI violations without strict feasibility guarantees, while achieving best-of-both-worlds guarantees. The authors prove tilde-$O(\sqrt{T})$ regret in the stochastic setting and a constant-factor $\alpha/(\alpha+1)$ competitive ratio in the adversarial setting, with budgets satisfied and ROI violations vanishing; they also relax the safe-policy requirement to frequent safety over intervals, and demonstrate applicability to bidding in practical mechanisms, including non-truthful first-price auctions. Overall, the framework provides a robust, minimally-assumptive approach to constrained online decision making, bridging stochastic/adversarial analyses and enabling effective online bidding strategies in realistic auction environments.
Abstract
We study online learning problems in which a decision maker has to make a sequence of costly decisions, with the goal of maximizing their expected reward while adhering to budget and return-on-investment (ROI) constraints. Existing primal-dual algorithms designed for constrained online learning problems under adversarial inputs rely on two fundamental assumptions. First, the decision maker must know beforehand the value of parameters related to the degree of strict feasibility of the problem (i.e. Slater parameters). Second, a strictly feasible solution to the offline optimization problem must exist at each round. Both requirements are unrealistic for practical applications such as bidding in online ad auctions. In this paper, we show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers. This results in a ``dual-balancing'' framework which ensures that dual variables stay sufficiently small, even in the absence of knowledge about Slater's parameter. We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions, under stochastic and adversarial inputs. Finally, we show how to instantiate the framework to optimally bid in various mechanisms of practical relevance, such as first- and second-price auctions.
