Table of Contents
Fetching ...

An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints

Jordan Lekeufack, Michael I. Jordan

TL;DR

This work extends Online Convex Optimization to a constrained, adversarial setting where the environment provides predictions of losses and constraints. It introduces an optimistic COCO meta-algorithm that couples a surrogate Lagrangian with a Lyapunov potential, achieving regret $O(\sqrt{E_T(f)})$ and CCV $\tilde{O}(\sqrt{E_T(g^+)})$, while maintaining projection-based efficiency. The framework also delivers dynamic regret bounds tied to the path length and extends to experts and adversarial contextual bandits with risk constraints, yielding improved, prediction-dependent performance. In worst-case scenarios where predictions are poor, the rates reduce to known $O(\sqrt{T})$ benchmarks, but with high-quality predictions the method yields substantially better guarantees. These results offer practical, scalable strategies for safe online learning in adversarial but predictable environments.

Abstract

We study Online Convex Optimization (OCO) with adversarial constraints, where an online algorithm must make sequential decisions to minimize both convex loss functions and cumulative constraint violations. We focus on a setting where the algorithm has access to predictions of the loss and constraint functions. Our results show that we can improve the current best bounds of $ O(\sqrt{T}) $ regret and $ \tilde{O}(\sqrt{T}) $ cumulative constraint violations to $ O(\sqrt{E_T(f)}) $ and $ \tilde{O}(\sqrt{E_T(g^+)}) $, respectively, where $ E_T(f) $ and $E_T(g^+)$ represent the cumulative prediction errors of the loss and constraint functions. In the worst case, where $E_T(f) = O(T) $ and $ E_T(g^+) = O(T) $ (assuming bounded gradients of the loss and constraint functions), our rates match the prior $ O(\sqrt{T}) $ results. However, when the loss and constraint predictions are accurate, our approach yields significantly smaller regret and cumulative constraint violations. Finally, we apply this to the setting of adversarial contextual bandits with sequential risk constraints, obtaining optimistic bounds $O (\sqrt{E_T(f)} T^{1/3})$ regret and $O(\sqrt{E_T(g^+)} T^{1/3})$ constraints violation, yielding better performance than existing results when prediction quality is sufficiently high.

An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints

TL;DR

This work extends Online Convex Optimization to a constrained, adversarial setting where the environment provides predictions of losses and constraints. It introduces an optimistic COCO meta-algorithm that couples a surrogate Lagrangian with a Lyapunov potential, achieving regret and CCV , while maintaining projection-based efficiency. The framework also delivers dynamic regret bounds tied to the path length and extends to experts and adversarial contextual bandits with risk constraints, yielding improved, prediction-dependent performance. In worst-case scenarios where predictions are poor, the rates reduce to known benchmarks, but with high-quality predictions the method yields substantially better guarantees. These results offer practical, scalable strategies for safe online learning in adversarial but predictable environments.

Abstract

We study Online Convex Optimization (OCO) with adversarial constraints, where an online algorithm must make sequential decisions to minimize both convex loss functions and cumulative constraint violations. We focus on a setting where the algorithm has access to predictions of the loss and constraint functions. Our results show that we can improve the current best bounds of regret and cumulative constraint violations to and , respectively, where and represent the cumulative prediction errors of the loss and constraint functions. In the worst case, where and (assuming bounded gradients of the loss and constraint functions), our rates match the prior results. However, when the loss and constraint predictions are accurate, our approach yields significantly smaller regret and cumulative constraint violations. Finally, we apply this to the setting of adversarial contextual bandits with sequential risk constraints, obtaining optimistic bounds regret and constraints violation, yielding better performance than existing results when prediction quality is sufficiently high.

Paper Structure

This paper contains 24 sections, 18 theorems, 119 equations, 1 table, 5 algorithms.

Key Result

Lemma 5

For any OCO algorithm ${\cal A}$, if $\Phi$ is a Lyapunov potential function, we have that for any $t\geq 1$, and any $u\in{\cal X}$ where $S_t = \sum_{\tau=1}^t g^+_\tau(x_\tau)(\Phi'(Q_{\tau+1}) - \Phi'(Q_\tau))$, and $\text{Regret}_t^{\cal A}(u;\; {\cal L}_{1\dots t})$ is the regret of the algorithm running on the sequence of losses ${\cal L}_1, \dots, {\cal L}_T$.

Theorems & Definitions (22)

  • Lemma 5: Regret decomposition
  • Theorem 7: Optimistic COCO regret and CCV guarantees
  • Remark 8
  • Remark 9
  • Theorem 10: Optimistic Adagrad, adapted from rakhlin2013optimization, Corollary 2
  • Corollary 11: Optimistic Adagrad COCO
  • Remark 12
  • Theorem 14: Dynamic Regret guarantees in OCO jadbabaie2015online
  • Definition 15
  • Corollary 16: Dynamic Regret in COCO
  • ...and 12 more