Tight Bounds for Online Convex Optimization with Adversarial Constraints
Abhishek Sinha, Rahul Vaze
TL;DR
This work tackles constrained online convex optimization (COCO) under adaptive adversaries, aiming to minimize regret while controlling cumulative constraint violations (CCV). It introduces a Lyapunov drift framework together with AdaGrad on carefully constructed surrogate costs to balance objective loss and constraint penalties, enabling an $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV without restrictive assumptions. The key ideas are the surrogate cost $\hat{f}_t = V\tilde{f}_t + \Phi'(Q(t))\tilde{g}_t$, an exponential Lyapunov function $\Phi(x)=e^{\lambda x}-1$, and a horizon-free, parameter-free design that yields explicit bounds for both regret and CCV, as well as a small movement cost $\tilde{O}(\sqrt{T})$. The results significantly advance the COCO literature by closing the gap to the $O(\sqrt{T})$ lower bound for CCV under general adversaries, with practical, efficient updates and potential applicability to related constrained online learning problems.
Abstract
A well-studied generalization of the standard online convex optimization (OCO) is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring small cumulative constraint violation (CCV) against an adaptive adversary. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. We establish this result by effectively combining the adaptive regret bound of the AdaGrad algorithm with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.
