Table of Contents
Fetching ...

Safe Online Convex Optimization with Multi-Point Feedback

Spencer Hutchinson, Mahnoosh Alizadeh

TL;DR

This work studies safe online convex optimization with an unknown constraint under zero-order, multi-point feedback. It introduces MP-ROGD, a projection-free algorithm that uses forward-difference gradient estimation and optimistic/pessimistic action sets to ensure zero constraint violations while achieving $O(d\sqrt{T})$ regret when the constraint is smooth and strongly convex. The analysis provides gradient-estimation error bounds, set-containment properties, and a regret bound, supported by a proof sketch and auxiliary lemmas, alongside numerical experiments. Empirical results compare MP-ROGD to baselines with full constraint information and first-order feedback, illustrating the trade-offs between constraint knowledge and zero-order information and highlighting the method's practical relevance for safe learning and control under bandit feedback.

Abstract

Motivated by the stringent safety requirements that are often present in real-world applications, we study a safe online convex optimization setting where the player needs to simultaneously achieve sublinear regret and zero constraint violation while only using zero-order information. In particular, we consider a multi-point feedback setting, where the player chooses $d + 1$ points in each round (where $d$ is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve $\mathcal{O}(d \sqrt{T})$ regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the impacts of the unknown constraint and zero-order feedback on empirical performance.

Safe Online Convex Optimization with Multi-Point Feedback

TL;DR

This work studies safe online convex optimization with an unknown constraint under zero-order, multi-point feedback. It introduces MP-ROGD, a projection-free algorithm that uses forward-difference gradient estimation and optimistic/pessimistic action sets to ensure zero constraint violations while achieving regret when the constraint is smooth and strongly convex. The analysis provides gradient-estimation error bounds, set-containment properties, and a regret bound, supported by a proof sketch and auxiliary lemmas, alongside numerical experiments. Empirical results compare MP-ROGD to baselines with full constraint information and first-order feedback, illustrating the trade-offs between constraint knowledge and zero-order information and highlighting the method's practical relevance for safe learning and control under bandit feedback.

Abstract

Motivated by the stringent safety requirements that are often present in real-world applications, we study a safe online convex optimization setting where the player needs to simultaneously achieve sublinear regret and zero constraint violation while only using zero-order information. In particular, we consider a multi-point feedback setting, where the player chooses points in each round (where is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the impacts of the unknown constraint and zero-order feedback on empirical performance.
Paper Structure (25 sections, 16 theorems, 38 equations, 1 figure)

This paper contains 25 sections, 16 theorems, 38 equations, 1 figure.

Key Result

proposition 1

Let Assumptions ass:cost_funcs, ass:const and ass:smooth hold. Then, for every $t \in [T]$, it holds that Furthermore, it holds that

Figures (1)

  • Figure 1: Average regret of MP-ROGD and benchmark algorithms in a setting with linear cost functions and a quadratic constraint function (a) and a setting with quadratic cost functions and quadratic constraint (b). The benchmark algorithms are MP-OGD (agarwal2010optimal) with full constraint information and ROGD (accsub) with first-order constraint feedback.

Theorems & Definitions (26)

  • definition 1: Smooth function
  • definition 2: Strongly-convex function
  • proposition 1: Properties of gradient estimators
  • proposition 2
  • proposition 3: Validity
  • proposition 4: Safety guarantee
  • theorem 1
  • lemma 1
  • lemma 2
  • lemma 3
  • ...and 16 more