Sample-Optimal Zero-Violation Safety For Continuous Control

Ritabrata Ray; Yorie Nakahira; Soummya Kar

Sample-Optimal Zero-Violation Safety For Continuous Control

Ritabrata Ray, Yorie Nakahira, Soummya Kar

Abstract

In this paper, we study the problem of ensuring safety with a few shots of samples for partially unknown systems. We first characterize a fundamental limit when producing safe actions is not possible due to insufficient information or samples. Then, we develop a technique that can generate provably safe actions and recovery behaviors using a minimum number of samples. In the performance analysis, we also establish Nagumos theorem - like results with relaxed assumptions, which is potentially useful in other contexts. Finally, we discuss how the proposed method can be integrated into a policy gradient algorithm to assure safety and stability with a handful of samples without stabilizing initial policies or generative models to probe safe actions.

Sample-Optimal Zero-Violation Safety For Continuous Control

Abstract

Paper Structure (13 sections, 10 theorems, 58 equations, 1 figure, 2 algorithms)

This paper contains 13 sections, 10 theorems, 58 equations, 1 figure, 2 algorithms.

Introduction
Model and Problem Statement
Safe control algorithm
Application to safe exploration for RL algorithms
Numerical Study
Conclusion and Future Work
Proof of Theorem \ref{['thm: main result']}
Proof of Theorem \ref{['thm: sample case optimality']}
Proof of Theorem \ref{['thm: Gradient Unbiased Estimate']}
Bicycle dynamics example illustrating the applicability of assumptions \ref{['Assumption: SVD']} and \ref{['Assumption: Singular value bounds']}
Implementation Details of Numerical Simulations from section \ref{['sec: simulations']}
Comparison with Adaptive Safe Control Algorithms
Comparison with RL Algorithms

Key Result

Theorem III.1

Assume that there exists a state $x$ in the boundary $\partial \mathcal{S}$ of the safety set $\mathcal{S}$ such that $\nabla \phi(x) \neq 0$. Consider system eq: sys_dynamics with unknown $f(.)$ and known $g(.)$. Given information $I(t,\delta)$ with $\delta=0$, no policy of the form eq:policy struc

Figures (1)

Figure 1: Comparing (a) forward invariance and (b) forward convergence of our algorithm \ref{['alg: main algorithm']} with several safe adaptive control algorithms. (c) Comparing (c) safety rate and (d) convergence rate of algorithm \ref{['alg: REINFORCE with Safety']} with other model-free RL algorithms.

Theorems & Definitions (20)

Definition II.1
Theorem III.1
Theorem III.2: Forward Persistence of algorithm \ref{['alg: main algorithm']}
Theorem IV.1: Policy-Gradient
Lemma VII.1
Lemma VII.2
proof : Proof (Lemma \ref{['lem: chain rule for right hand derivatives']})
Lemma VII.3
Lemma VII.4
proof
...and 10 more

Sample-Optimal Zero-Violation Safety For Continuous Control

Abstract

Sample-Optimal Zero-Violation Safety For Continuous Control

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (20)