Online Learning under Budget and ROI Constraints via Weak Adaptivity

Matteo Castiglioni; Andrea Celli; Christian Kroer

Online Learning under Budget and ROI Constraints via Weak Adaptivity

Matteo Castiglioni, Andrea Celli, Christian Kroer

TL;DR

The paper tackles online learning with long-horizon budget and ROI constraints under both stochastic and adversarial inputs, addressing the impracticality of knowing the Slater parameter and requiring a feasible action every round. It introduces a dual-balancing framework that embeds weakly adaptive regret minimizers into the primal-dual template, enabling bounded dual multipliers and sublinear ROI violations without strict feasibility guarantees, while achieving best-of-both-worlds guarantees. The authors prove tilde-$O(\sqrt{T})$ regret in the stochastic setting and a constant-factor $\alpha/(\alpha+1)$ competitive ratio in the adversarial setting, with budgets satisfied and ROI violations vanishing; they also relax the safe-policy requirement to frequent safety over intervals, and demonstrate applicability to bidding in practical mechanisms, including non-truthful first-price auctions. Overall, the framework provides a robust, minimally-assumptive approach to constrained online decision making, bridging stochastic/adversarial analyses and enabling effective online bidding strategies in realistic auction environments.

Abstract

We study online learning problems in which a decision maker has to make a sequence of costly decisions, with the goal of maximizing their expected reward while adhering to budget and return-on-investment (ROI) constraints. Existing primal-dual algorithms designed for constrained online learning problems under adversarial inputs rely on two fundamental assumptions. First, the decision maker must know beforehand the value of parameters related to the degree of strict feasibility of the problem (i.e. Slater parameters). Second, a strictly feasible solution to the offline optimization problem must exist at each round. Both requirements are unrealistic for practical applications such as bidding in online ad auctions. In this paper, we show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers. This results in a ``dual-balancing'' framework which ensures that dual variables stay sufficiently small, even in the absence of knowledge about Slater's parameter. We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions, under stochastic and adversarial inputs. Finally, we show how to instantiate the framework to optimally bid in various mechanisms of practical relevance, such as first- and second-price auctions.

Online Learning under Budget and ROI Constraints via Weak Adaptivity

TL;DR

regret in the stochastic setting and a constant-factor

competitive ratio in the adversarial setting, with budgets satisfied and ROI violations vanishing; they also relax the safe-policy requirement to frequent safety over intervals, and demonstrate applicability to bidding in practical mechanisms, including non-truthful first-price auctions. Overall, the framework provides a robust, minimally-assumptive approach to constrained online decision making, bridging stochastic/adversarial analyses and enabling effective online bidding strategies in realistic auction environments.

Abstract

Paper Structure (24 sections, 30 theorems, 69 equations, 1 table, 3 algorithms)

This paper contains 24 sections, 30 theorems, 69 equations, 1 table, 3 algorithms.

Introduction
Contributions
Related works
Preliminaries
Baselines
Adaptivity in Primal-Dual Frameworks
A Standard Primal-Dual Template
When Standard Primal-Dual Algorithms Fail
New Requirements: Weak Adaptivity
A Weakly Adaptive Dual Regret Minimizer
Bounding the Lagrange Multipliers
Regret and Violations Guarantees
Relaxing the Safe-Policy Assumption
Bidding in Repeated Non-Truthful Auctions
Further Related Works
...and 9 more sections

Key Result

Lemma 4.1

Let $\lambda_1=\mu_1=0$. Then, OGD guarantees that, for any interval $\mathcal{I}=[t_1,t_2]$, it holds where learning rates are set as follows: $\eta_{\texttt{B}\xspace}~\coloneqq~1/\rho T^{1/2}$, and $\eta_{\texttt{R}\xspace}~\coloneqq~1/\mleft(6+T^{1/2}+\EuScript{E}^{\texttt{D},\texttt{B}}_{T}+6\EuScript{E}_{T,\delta}^{\mathcal{I}}+16\EuScript{E}^{\texttt{P}}_{T,\delta}\mright)$.

Theorems & Definitions (48)

Lemma 4.1
Lemma 4.1
Definition 5.1: $\delta$-safe policy
Theorem 5.2
Lemma 5.2
Lemma 5.2
Definition 6.1
Lemma 6.1
Lemma 6.1
Definition 6.2: $\mleft(\delta,q,\texttt{OPT}\xspace\mright)$-optimal policy
...and 38 more

Online Learning under Budget and ROI Constraints via Weak Adaptivity

TL;DR

Abstract

Online Learning under Budget and ROI Constraints via Weak Adaptivity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (48)