Table of Contents
Fetching ...

Sample-Optimal Zero-Violation Safety For Continuous Control

Ritabrata Ray, Yorie Nakahira, Soummya Kar

Abstract

In this paper, we study the problem of ensuring safety with a few shots of samples for partially unknown systems. We first characterize a fundamental limit when producing safe actions is not possible due to insufficient information or samples. Then, we develop a technique that can generate provably safe actions and recovery behaviors using a minimum number of samples. In the performance analysis, we also establish Nagumos theorem - like results with relaxed assumptions, which is potentially useful in other contexts. Finally, we discuss how the proposed method can be integrated into a policy gradient algorithm to assure safety and stability with a handful of samples without stabilizing initial policies or generative models to probe safe actions.

Sample-Optimal Zero-Violation Safety For Continuous Control

Abstract

In this paper, we study the problem of ensuring safety with a few shots of samples for partially unknown systems. We first characterize a fundamental limit when producing safe actions is not possible due to insufficient information or samples. Then, we develop a technique that can generate provably safe actions and recovery behaviors using a minimum number of samples. In the performance analysis, we also establish Nagumos theorem - like results with relaxed assumptions, which is potentially useful in other contexts. Finally, we discuss how the proposed method can be integrated into a policy gradient algorithm to assure safety and stability with a handful of samples without stabilizing initial policies or generative models to probe safe actions.
Paper Structure (13 sections, 10 theorems, 58 equations, 1 figure, 2 algorithms)

This paper contains 13 sections, 10 theorems, 58 equations, 1 figure, 2 algorithms.

Key Result

Theorem III.1

Assume that there exists a state $x$ in the boundary $\partial \mathcal{S}$ of the safety set $\mathcal{S}$ such that $\nabla \phi(x) \neq 0$. Consider system eq: sys_dynamics with unknown $f(.)$ and known $g(.)$. Given information $I(t,\delta)$ with $\delta=0$, no policy of the form eq:policy struc

Figures (1)

  • Figure 1: Comparing (a) forward invariance and (b) forward convergence of our algorithm \ref{['alg: main algorithm']} with several safe adaptive control algorithms. (c) Comparing (c) safety rate and (d) convergence rate of algorithm \ref{['alg: REINFORCE with Safety']} with other model-free RL algorithms.

Theorems & Definitions (20)

  • Definition II.1
  • Theorem III.1
  • Theorem III.2: Forward Persistence of algorithm \ref{['alg: main algorithm']}
  • Theorem IV.1: Policy-Gradient
  • Lemma VII.1
  • Lemma VII.2
  • proof : Proof (Lemma \ref{['lem: chain rule for right hand derivatives']})
  • Lemma VII.3
  • Lemma VII.4
  • proof
  • ...and 10 more