SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems

Yusuf Umut Ciftci; Darren Chiu; Zeyuan Feng; Gaurav S. Sukhatme; Somil Bansal

SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems

Yusuf Umut Ciftci, Darren Chiu, Zeyuan Feng, Gaurav S. Sukhatme, Somil Bansal

TL;DR

SAFE-GIL deliberately injects adversarial disturbance in the system during data collection to guide the expert towards safety-critical states and demonstrates a significant reduction in safety failures particularly in low data regimes where the likelihood of learning errors and therefore safety violations is higher.

Abstract

Behavior cloning (BC) is a widely-used approach in imitation learning, where a robot learns a control policy by observing an expert supervisor. However, the learned policy can make errors and might lead to safety violations, which limits their utility in safety-critical robotics applications. While prior works have tried improving a BC policy via additional real or synthetic action labels, adversarial training, or runtime filtering, none of them explicitly focus on reducing the BC policy's safety violations during training time. We propose SAFE-GIL, a design-time method to learn safety-aware behavior cloning policies. SAFE-GIL deliberately injects adversarial disturbance in the system during data collection to guide the expert towards safety-critical states. This disturbance injection simulates potential policy errors that the system might encounter during the test time. By ensuring that training more closely replicates expert behavior in safety-critical states, our approach results in safer policies despite policy errors during the test time. We further develop a reachability-based method to compute this adversarial disturbance. We compare SAFE-GIL with various behavior cloning techniques and online safety-filtering methods in three domains: autonomous ground navigation, aircraft taxiing, and aerial navigation on a quadrotor testbed. Our method demonstrates a significant reduction in safety failures, particularly in low data regimes where the likelihood of learning errors, and therefore safety violations, is higher. See our website here: https://y-u-c.github.io/safegil/

SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems

TL;DR

Abstract

Paper Structure (9 sections, 5 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 9 sections, 5 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Problem Formulation
Background: Hamilton-Jacobi Reachability
SAFE-GIL: SAFEty Guided Imitation Learning
Experiments
Autonomous Navigation Using a State-Based Policy
Autonomous Aircraft Taxiing Using a Vision-Based Policy
Quadrotor Navigation: Hardware experiment
Discussion and Future Work

Figures (7)

Figure 1: Left: Human controlled quadrotor demonstration trajectories with (SAFE-GIL) and without adversarial safety guidance. With SAFE-GIL, the robot observes more safety-critical states during training time (illustrated by red quadrotor icons). Right: BC and SAFE-GIL policy rollouts. The red cross denotes a collision. SAFE-GIL results in a significant improvement in robot safety during the test time.
Figure 2: Top row: Computed BRT and disturbance for $\theta = 0$. Middle row: Demonstration trajectories with (Orange) and without (Blue) disturbance injection. Bottom row: BC and SAFE-GIL policy rollouts. Right column (Top): Mean collision rate and (Bottom) cost of safe trajectories vs number of demonstrations. SAFE-GIL results in a significant safety improvement.
Figure 3: Mean collision rate (Left) and cost of safe rollouts (Right) vs number of demonstrations. Adversarial noise injection leads to a significant safety improvement over random noise.
Figure 4: SAFE-GIL can be combined with other imitation learning approaches to have complementary safety and performance advantages.
Figure 5: Top: Expert demonstration with and without disturbance injection. Middle: BC and SAFE-GIL rollouts from the same initial state. BC fails to keep the aircraft on the runway. Bottom: Mean excursion rate (Left) and Mean squared distance from the centerline (Middle) vs number of demonstrations. Safety value distribution of the collected demonstrations (Right) is shifted towards lower values for SAFE-GIL.
...and 2 more figures

SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems

TL;DR

Abstract

SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (7)