Semi-Supervised Safe Visuomotor Policy Synthesis using Barrier Certificates

Manan Tayal; Aditya Singh; Pushpak Jagtap; Shishir Kolathaya

Semi-Supervised Safe Visuomotor Policy Synthesis using Barrier Certificates

Manan Tayal, Aditya Singh, Pushpak Jagtap, Shishir Kolathaya

TL;DR

This work proposes a novel approach to synthesize a semi-supervised safe visuomotor policy using barrier certificates that integrate the strengths of model-free supervised learning and model-based control methods.

Abstract

In modern robotics, addressing the lack of accurate state space information in real-world scenarios has led to a significant focus on utilizing visuomotor observation to provide safety assurances. Although supervised learning methods, such as imitation learning, have demonstrated potential in synthesizing control policies based on visuomotor observations, they require ground truth safety labels for the complete dataset and do not provide formal safety assurances. On the other hand, traditional control-theoretic methods like Control Barrier Functions (CBFs) and Hamilton-Jacobi (HJ) Reachability provide formal safety guarantees but depend on accurate knowledge of system dynamics, which is often unavailable for high-dimensional visuomotor data. To overcome these limitations, we propose a novel approach to synthesize a semi-supervised safe visuomotor policy using barrier certificates that integrate the strengths of model-free supervised learning and model-based control methods. This framework synthesizes a provably safe controller without requiring safety labels for the complete dataset and ensures completeness guarantees for both the barrier certificate and the policy. We validate our approach through distinct case studies: an inverted pendulum system and the obstacle avoidance of an autonomous mobile robot.

Semi-Supervised Safe Visuomotor Policy Synthesis using Barrier Certificates

TL;DR

Abstract

Paper Structure (19 sections, 4 theorems, 21 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 4 theorems, 21 equations, 3 figures, 1 table, 1 algorithm.

Introduction
Preliminaries and Problem Formulation
Notations
System Description
Control Barrier Certificates (CBCs)
Neural CBCs with Completeness Guarantees
Problem Formulation
Semi-Supervised Policy Learning Framework
Safety based Latent Dynamics (SaLaD)
Formal Verification Integration
Controller Synthesis
Training Scheme
Warm Starting the SaLaD Model
Iterative Training with Controller Synthesis
Case Studies
...and 4 more sections

Key Result

Lemma 1

For a discrete-time control system $\mathbb{S}=(X,U,f)$, safe set $X_s \subseteq \mathcal{C}$, and unsafe set $X_u \subseteq X-\mathcal{C}$, the existence of a control barrier certificate, $B$, as defined in Definition definition: CBC definition, under a control policy $\pi:X\rightarrow U$ implies t

Figures (3)

Figure 1: Presents the architecture of the Safety-driven Latent Dynamics (SaLaD) model. Training the SaLaD model the visuomotor observation $\mathcal{O}_i$ is encoded by $E_{\theta}$ into a latent representation $z_i$. Then, SaLaD predicts the next latent states $z_{i+1}$ as well as the Barrier Certificate over the safe and unsafe datasets for each latent state, and we optimize SaLaD using Equation \ref{['eq: salad_loss']}. Next, visuomotor observation $\mathcal{O}_{i+1}$ is encoded using target net $E_{\theta^-}$ and used as latent targets only during training.
Figure 2: This figure demonstrates the results of our framework applied to OpenAI Gym's Pendulum environment. Fig. \ref{['fig:sys_pend']} illustrates the system diagram, where the light green area represents the safe region (used for sampling safe states, $\mathcal{S}$), and the light red area denotes the unsafe region (used for sampling unsafe states, $\mathcal{U}$). Fig. \ref{['fig:plot_latent_pend']} shows the representation of safe, unsafe and buffer states. The figure shows that the safe and unsafe regions are well segregated. Fig. \ref{['fig:plot_real_pend']} visualizes the safe, unsafe, and buffer points in the $\Theta-\dot{\Theta}$ plane. We observe that the boundary of the trained barrier function (black dots) successfully separates the unsafe and safe regions. Fig. \ref{['fig:sspl_pend']} shows the pendulum’s trajectory generated by the learned policy $\pi_{\theta}$, where the size of the arrows indicates the magnitude of the control actions produced by the policy.
Figure 3: Presents the results on the velocity-driven quadruped model on PyBullet. Fig. \ref{['fig:sspl_gr']} shows the top view of the $x-y$ plane, which represents the visuomotor input of the system. Fig. \ref{['fig:traj_sspl_gr']} shows the x-y plane where the light green area represents the safe region (used for sampling safe states, $\mathcal{S}$), and the light red area denotes the unsafe region (used for sampling unsafe states, $\mathcal{U}$). The figure also shows multiple trajectories initiating inside the safe set following the policy $\pi_{\theta}$, synthesized using the proposed framework.

Theorems & Definitions (7)

Definition 1
Lemma 1: jagtap2020compositional
Theorem 1
proof
Lemma 2: pauli2021training
Theorem 2
proof

Semi-Supervised Safe Visuomotor Policy Synthesis using Barrier Certificates

TL;DR

Abstract

Semi-Supervised Safe Visuomotor Policy Synthesis using Barrier Certificates

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (7)