Online Learning with Unknown Constraints

Karthik Sridharan; Seung Won Wilson Yoo

Online Learning with Unknown Constraints

Karthik Sridharan, Seung Won Wilson Yoo

TL;DR

This work studies online learning under unknown safety constraints, where actions must satisfy an unknown constraint at every round while minimizing regret w.r.t. the best safe action. It develops a meta-algorithm that combines an online regression oracle (to estimate the unknown constraint) and an online learning oracle (to optimize decisions) and introduces a per-step complexity term $V_\kappa$ that captures the trade-off between learning the constraint and reducing loss. The regret is bounded by a combination of the regression regret, the eluder-dimension of the constraint class, and the online-learning regret, with a necessary lower bound showing the complexity measure is essential; the authors instantiate the results for linear and generalized-linear constraints achieving $O(\\sqrt{T})$ regret. They further extend the framework to multiple constraints and vector/scalar feedback, providing concrete mapping constructions and showing how the bounds scale with problem parameters such as dimensionality, Lipschitz constants, and initial safe sets. Overall, the paper advances safe online learning by providing a general, theoretically grounded algorithmic template and concrete limits for unknown constraints, with practical implications for safely deploying learning agents in uncertain environments.

Abstract

We consider the problem of online learning where the sequence of actions played by the learner must adhere to an unknown safety constraint at every round. The goal is to minimize regret with respect to the best safe action in hindsight while simultaneously satisfying the safety constraint with high probability on each round. We provide a general meta-algorithm that leverages an online regression oracle to estimate the unknown safety constraint, and converts the predictions of an online learning oracle to predictions that adhere to the unknown safety constraint. On the theoretical side, our algorithm's regret can be bounded by the regret of the online regression and online learning oracles, the eluder dimension of the model class containing the unknown safety constraint, and a novel complexity measure that captures the difficulty of safe learning. We complement our result with an asymptotic lower bound that shows that the aforementioned complexity measure is necessary. When the constraints are linear, we instantiate our result to provide a concrete algorithm with $\sqrt{T}$ regret using a scaling transformation that balances optimistic exploration with pessimistic constraint satisfaction.

Online Learning with Unknown Constraints

TL;DR

that captures the trade-off between learning the constraint and reducing loss. The regret is bounded by a combination of the regression regret, the eluder-dimension of the constraint class, and the online-learning regret, with a necessary lower bound showing the complexity measure is essential; the authors instantiate the results for linear and generalized-linear constraints achieving

regret. They further extend the framework to multiple constraints and vector/scalar feedback, providing concrete mapping constructions and showing how the bounds scale with problem parameters such as dimensionality, Lipschitz constants, and initial safe sets. Overall, the paper advances safe online learning by providing a general, theoretically grounded algorithmic template and concrete limits for unknown constraints, with practical implications for safely deploying learning agents in uncertain environments.

Abstract

regret using a scaling transformation that balances optimistic exploration with pessimistic constraint satisfaction.

Paper Structure (31 sections, 34 theorems, 84 equations, 3 algorithms)

This paper contains 31 sections, 34 theorems, 84 equations, 3 algorithms.

Introduction
Key Contributions
Related Works
Setup and Preliminary
Additional Notation
Online Regression Oracles and Signal Functions
Online Learning Oracles
Eluder Dimension
Main Results
Algorithm and Upper Bound
Optimal Mapping and Adapting to $\kappa$
Long Term Constraint Versus No Violations with High Probability
Lower Bound
Examples
Finite Action Spaces
...and 16 more sections

Key Result

Proposition 3.1

There exists an algorithm satisfying Assumption as:su:oracleol with

Theorems & Definitions (59)

Proposition 3.1
Definition 3.1
Proposition 4.1
Proposition 4.2
Theorem 4.3
Remark
Lemma 4.4
Lemma 4.5
Proposition 4.6
Theorem 4.7
...and 49 more

Online Learning with Unknown Constraints

TL;DR

Abstract

Online Learning with Unknown Constraints

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (59)