Table of Contents
Fetching ...

Simplex-enabled Safe Continual Learning Machine

Hongpeng Cao, Yanbing Mao, Yihao Cai, Lui Sha, Marco Caccamo

TL;DR

The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.

Abstract

This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Coordinator. Specifically, the HP-Student is a pre-trained high-performance but not fully verified Phy-DRL, continuing to learn in a real plant to tune the action policy to be safe. In contrast, the HA-Teacher is a mission-reduced, physics-model-based, and verified design. As a complementary, HA-Teacher has two missions: backing up safety and correcting unsafe learning. The Coordinator triggers the interaction and the switch between HP-Student and HA-Teacher. Powered by the three interactive components, the SeC-learning machine can i) assure lifetime safety (i.e., safety guarantee in any continual-learning stage, regardless of HP-Student's success or convergence), ii) address the Sim2Real gap, and iii) learn to tolerate unknown unknowns in real plants. The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.

Simplex-enabled Safe Continual Learning Machine

TL;DR

The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.

Abstract

This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Coordinator. Specifically, the HP-Student is a pre-trained high-performance but not fully verified Phy-DRL, continuing to learn in a real plant to tune the action policy to be safe. In contrast, the HA-Teacher is a mission-reduced, physics-model-based, and verified design. As a complementary, HA-Teacher has two missions: backing up safety and correcting unsafe learning. The Coordinator triggers the interaction and the switch between HP-Student and HA-Teacher. Powered by the three interactive components, the SeC-learning machine can i) assure lifetime safety (i.e., safety guarantee in any continual-learning stage, regardless of HP-Student's success or convergence), ii) address the Sim2Real gap, and iii) learn to tolerate unknown unknowns in real plants. The experiments on a cart-pole system and a real quadruped robot demonstrate the distinguished features of the SeC-learning machine, compared with continual learning built on state-of-the-art safe DRL frameworks with approaches to addressing the Sim2Real gap.
Paper Structure (37 sections, 5 theorems, 65 equations, 14 figures, 2 tables)

This paper contains 37 sections, 5 theorems, 65 equations, 14 figures, 2 tables.

Key Result

Theorem 6.2

Consider the HA-Teacher's action policy hacteacherpolicy and the envelope patch $\Omega_{\text{patch}}$hacset, whose matrices $\widehat{\mathbf{F}}$ and $\widehat{\mathbf{P}}$ are computed according to with the matrices $\widehat{\mathbf{R}}$ and $\widehat{\mathbf{Q}}$ satisfying where $\beta \in (0,1)$ and $\omega > 0$ are given parameters. Under assm, the system realsyserror controlled by HA-T

Figures (14)

  • Figure 1: SeC-Learning Machine.
  • Figure 2: Contribution ratio $\gamma$.
  • Figure 3: System phase behavior.
  • Figure 4: Episode reward.
  • Figure 5: Two Episodes. Phase plots, given the same initial condition. The black dot and star denote the initial condition and final location, respectively.
  • ...and 9 more figures

Theorems & Definitions (13)

  • Definition 2.1
  • Remark 4.1
  • Remark 5.1: Parallel Running
  • Theorem 6.2
  • Remark 6.3: Suggestion from \ref{['stat1']}: Dwell time $\tau$ of HA-Teacher
  • Remark 6.4: Suggestion from \ref{['stat2']}: Backing up safety by envelope patches
  • Remark 6.5: Fast Computation
  • Lemma A.1: Schur Complement zhang2006schur
  • Lemma A.2
  • Lemma A.3
  • ...and 3 more