Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning

Jonas Günster; Puze Liu; Jan Peters; Davide Tateo

Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning

Jonas Günster, Puze Liu, Jan Peters, Davide Tateo

TL;DR

The safe exploration method, ATACOM, is extended with learnable constraints, with a particular focus on ensuring long-term safety and handling of uncertainty, and is competitive or superior to state-of-the-art methods in final performance while maintaining safer behavior during training.

Abstract

Safety is one of the key issues preventing the deployment of reinforcement learning techniques in real-world robots. While most approaches in the Safe Reinforcement Learning area do not require prior knowledge of constraints and robot kinematics and rely solely on data, it is often difficult to deploy them in complex real-world settings. Instead, model-based approaches that incorporate prior knowledge of the constraints and dynamics into the learning framework have proven capable of deploying the learning algorithm directly on the real robot. Unfortunately, while an approximated model of the robot dynamics is often available, the safety constraints are task-specific and hard to obtain: they may be too complicated to encode analytically, too expensive to compute, or it may be difficult to envision a priori the long-term safety requirements. In this paper, we bridge this gap by extending the safe exploration method, ATACOM, with learnable constraints, with a particular focus on ensuring long-term safety and handling of uncertainty. Our approach is competitive or superior to state-of-the-art methods in final performance while maintaining safer behavior during training.

Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning

TL;DR

Abstract

Paper Structure (34 sections, 19 equations, 18 figures, 6 tables, 3 algorithms)

This paper contains 34 sections, 19 equations, 18 figures, 6 tables, 3 algorithms.

Introduction
Related Work
Preliminaries
Distributional Reinforcement Learning
Safe Learning on the Constraint Manifold
Long-term Safety under Uncertainty
Feasibility Value Function for Long-Term Safety
Distributional Feasibility Value Iteration
Uncertainty-Aware Constraint using (Conditional) Value-at-Risk
Adaptive constraint threshold estimate
Policy Iteration with Learnable Constraint using ATACOM
Experiments
Cartpole
Navigation
3dof Robot Air Hockey
...and 19 more sections

Figures (18)

Figure 1: Distribution of $V_{F}^{\pi}$ and illustration of mean, VaR (red), and CVaR (green). The shaded area shows the cumulative probability $\alpha$.
Figure 2: Illustration of the feasible set (light blue), the learned at 0-level $red$ and threshold $\delta$. The threshold $\delta$ provides a small feasible region (white) to explore within a small cost budget.
Figure 3: Learning Curves for the Cartpole Environment
Figure 4: Learning Curves for the Navigation Environment
Figure 5: Learning Curves for the Air Hockey Environment
...and 13 more figures

Theorems & Definitions (1)

Definition 1

Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning

TL;DR

Abstract

Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (18)

Theorems & Definitions (1)