Safe Online Convex Optimization with Multi-Point Feedback

Spencer Hutchinson; Mahnoosh Alizadeh

Safe Online Convex Optimization with Multi-Point Feedback

Spencer Hutchinson, Mahnoosh Alizadeh

TL;DR

This work studies safe online convex optimization with an unknown constraint under zero-order, multi-point feedback. It introduces MP-ROGD, a projection-free algorithm that uses forward-difference gradient estimation and optimistic/pessimistic action sets to ensure zero constraint violations while achieving $O(d\sqrt{T})$ regret when the constraint is smooth and strongly convex. The analysis provides gradient-estimation error bounds, set-containment properties, and a regret bound, supported by a proof sketch and auxiliary lemmas, alongside numerical experiments. Empirical results compare MP-ROGD to baselines with full constraint information and first-order feedback, illustrating the trade-offs between constraint knowledge and zero-order information and highlighting the method's practical relevance for safe learning and control under bandit feedback.

Abstract

Motivated by the stringent safety requirements that are often present in real-world applications, we study a safe online convex optimization setting where the player needs to simultaneously achieve sublinear regret and zero constraint violation while only using zero-order information. In particular, we consider a multi-point feedback setting, where the player chooses $d + 1$ points in each round (where $d$ is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve $\mathcal{O}(d \sqrt{T})$ regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the impacts of the unknown constraint and zero-order feedback on empirical performance.

Safe Online Convex Optimization with Multi-Point Feedback

TL;DR

regret when the constraint is smooth and strongly convex. The analysis provides gradient-estimation error bounds, set-containment properties, and a regret bound, supported by a proof sketch and auxiliary lemmas, alongside numerical experiments. Empirical results compare MP-ROGD to baselines with full constraint information and first-order feedback, illustrating the trade-offs between constraint knowledge and zero-order information and highlighting the method's practical relevance for safe learning and control under bandit feedback.

Abstract

points in each round (where

is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve

regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the impacts of the unknown constraint and zero-order feedback on empirical performance.

Paper Structure (25 sections, 16 theorems, 38 equations, 1 figure)

This paper contains 25 sections, 16 theorems, 38 equations, 1 figure.

Introduction
Related Work
Preliminaries
Notation and Definitions
Problem Setup
Assumptions
Proposed Algorithm
Gradient Estimation
Optimistic and Pessimistic Action Sets
Validity and Safety Gaurantee
Regret Analysis
Proof sketch:
Supporting Lemmas
Numerical Experiments
Impact of unknown contraints
...and 10 more sections

Key Result

proposition 1

Let Assumptions ass:cost_funcs, ass:const and ass:smooth hold. Then, for every $t \in [T]$, it holds that Furthermore, it holds that

Figures (1)

Figure 1: Average regret of MP-ROGD and benchmark algorithms in a setting with linear cost functions and a quadratic constraint function (a) and a setting with quadratic cost functions and quadratic constraint (b). The benchmark algorithms are MP-OGD (agarwal2010optimal) with full constraint information and ROGD (accsub) with first-order constraint feedback.

Theorems & Definitions (26)

definition 1: Smooth function
definition 2: Strongly-convex function
proposition 1: Properties of gradient estimators
proposition 2
proposition 3: Validity
proposition 4: Safety guarantee
theorem 1
lemma 1
lemma 2
lemma 3
...and 16 more

Safe Online Convex Optimization with Multi-Point Feedback

TL;DR

Abstract

Safe Online Convex Optimization with Multi-Point Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (26)