Simplex-enabled Safe Continual Learning Machine

Hongpeng Cao; Yanbing Mao; Yihao Cai; Lui Sha; Marco Caccamo

Simplex-enabled Safe Continual Learning Machine

Hongpeng Cao, Yanbing Mao, Yihao Cai, Lui Sha, Marco Caccamo

TL;DR

The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.

Abstract

This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Coordinator. Specifically, the HP-Student is a pre-trained high-performance but not fully verified Phy-DRL, continuing to learn in a real plant to tune the action policy to be safe. In contrast, the HA-Teacher is a mission-reduced, physics-model-based, and verified design. As a complementary, HA-Teacher has two missions: backing up safety and correcting unsafe learning. The Coordinator triggers the interaction and the switch between HP-Student and HA-Teacher. Powered by the three interactive components, the SeC-learning machine can i) assure lifetime safety (i.e., safety guarantee in any continual-learning stage, regardless of HP-Student's success or convergence), ii) address the Sim2Real gap, and iii) learn to tolerate unknown unknowns in real plants. The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.

Simplex-enabled Safe Continual Learning Machine

TL;DR

Abstract

Paper Structure (37 sections, 5 theorems, 65 equations, 14 figures, 2 tables)

This paper contains 37 sections, 5 theorems, 65 equations, 14 figures, 2 tables.

Introduction
Related Work on Safe DRL
Challenges and Open Problems
Contribution: Simplex-enabled Safe Continual Learning Machine
Preliminaries: Safety Definition
Design Overview: SeC-Learning Machine
SeC-Learning Machine: HP-Student Component
HP-Student: Residual Action Policy and Safety-embedded Reward
HP-Student: Controllable Contribution Ratio
HP-Student: Continual Learning
SeC-Learning Machine: Coordinator Component
SeC-Learning Machine: HA-Teacher Component
Experiment
Cart-Pole System
Real Quadruped Robot
...and 22 more sections

Key Result

Theorem 6.2

Consider the HA-Teacher's action policy hacteacherpolicy and the envelope patch $\Omega_{\text{patch}}$hacset, whose matrices $\widehat{\mathbf{F}}$ and $\widehat{\mathbf{P}}$ are computed according to with the matrices $\widehat{\mathbf{R}}$ and $\widehat{\mathbf{Q}}$ satisfying where $\beta \in (0,1)$ and $\omega > 0$ are given parameters. Under assm, the system realsyserror controlled by HA-T

Figures (14)

Figure 1: SeC-Learning Machine.
Figure 2: Contribution ratio $\gamma$.
Figure 3: System phase behavior.
Figure 4: Episode reward.
Figure 5: Two Episodes. Phase plots, given the same initial condition. The black dot and star denote the initial condition and final location, respectively.
...and 9 more figures

Theorems & Definitions (13)

Definition 2.1
Remark 4.1
Remark 5.1: Parallel Running
Theorem 6.2
Remark 6.3: Suggestion from \ref{['stat1']}: Dwell time $\tau$ of HA-Teacher
Remark 6.4: Suggestion from \ref{['stat2']}: Backing up safety by envelope patches
Remark 6.5: Fast Computation
Lemma A.1: Schur Complement zhang2006schur
Lemma A.2
Lemma A.3
...and 3 more

Simplex-enabled Safe Continual Learning Machine

TL;DR

Abstract

Simplex-enabled Safe Continual Learning Machine

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (13)