Table of Contents
Fetching ...

Guided Safe Shooting: model based reinforcement learning with safety constraints

Giuseppe Paolo, Jonas Gonzalez-Billandon, Albert Thomas, Balázs Kégl

TL;DR

This paper introduces Guided Safe Shooting (GuSS), a model-based RL approach that can learn to control systems with minimal violations of the safety constraints, and proposes three different safe planners, one based on a simple random shooting strategy, two based on MAP-Elites, a more advanced divergent-search algorithm.

Abstract

In the last decade, reinforcement learning successfully solved complex control tasks and decision-making problems, like the Go board game. Yet, there are few success stories when it comes to deploying those algorithms to real-world scenarios. One of the reasons is the lack of guarantees when dealing with and avoiding unsafe states, a fundamental requirement in critical control engineering systems. In this paper, we introduce Guided Safe Shooting (GuSS), a model-based RL approach that can learn to control systems with minimal violations of the safety constraints. The model is learned on the data collected during the operation of the system in an iterated batch fashion, and is then used to plan for the best action to perform at each time step. We propose three different safe planners, one based on a simple random shooting strategy and two based on MAP-Elites, a more advanced divergent-search algorithm. Experiments show that these planners help the learning agent avoid unsafe situations while maximally exploring the state space, a necessary aspect when learning an accurate model of the system. Furthermore, compared to model-free approaches, learning a model allows GuSS reducing the number of interactions with the real-system while still reaching high rewards, a fundamental requirement when handling engineering systems.

Guided Safe Shooting: model based reinforcement learning with safety constraints

TL;DR

This paper introduces Guided Safe Shooting (GuSS), a model-based RL approach that can learn to control systems with minimal violations of the safety constraints, and proposes three different safe planners, one based on a simple random shooting strategy, two based on MAP-Elites, a more advanced divergent-search algorithm.

Abstract

In the last decade, reinforcement learning successfully solved complex control tasks and decision-making problems, like the Go board game. Yet, there are few success stories when it comes to deploying those algorithms to real-world scenarios. One of the reasons is the lack of guarantees when dealing with and avoiding unsafe states, a fundamental requirement in critical control engineering systems. In this paper, we introduce Guided Safe Shooting (GuSS), a model-based RL approach that can learn to control systems with minimal violations of the safety constraints. The model is learned on the data collected during the operation of the system in an iterated batch fashion, and is then used to plan for the best action to perform at each time step. We propose three different safe planners, one based on a simple random shooting strategy and two based on MAP-Elites, a more advanced divergent-search algorithm. Experiments show that these planners help the learning agent avoid unsafe situations while maximally exploring the state space, a necessary aspect when learning an accurate model of the system. Furthermore, compared to model-free approaches, learning a model allows GuSS reducing the number of interactions with the real-system while still reaching high rewards, a fundamental requirement when handling engineering systems.
Paper Structure (25 sections, 5 equations, 8 figures, 2 tables, 6 algorithms)

This paper contains 25 sections, 5 equations, 8 figures, 2 tables, 6 algorithms.

Figures (8)

  • Figure 1: An illustrative example of planning with a model on the Acrobot environment where safety has to be considered. The agent controls the torque on the first joint with the goal of getting its end effector as high as possible, avoiding the unsafe zone (red area). Starting in the rest position (left) the agent uses its model to find the best plan (middle) that will maximize the reward while satisfying the safety constraint and execute it on the real system (right).
  • Figure 2: (a) Toy environment. The agent has to navigate from Start to Goal without traversing the unsafe areas in gray. (b) Percentage of generated safe plans at each step. (c) Total amount of the safe space explored through safe plans. (d) Average performance of the algorithms. The results show the mean over 10 seeds, shaded areas represent one standard deviation.
  • Figure 3: Mean reward and probability percentage of unsafety for the three test environments. Dashed curves indicate Model-free baselines and plain one Model-based approach. The red dashed line indicates the random unsafe probability. All curves represent the mean over 6 random seed.
  • Figure 4: Pendulum upright task.
  • Figure 5: Acrobot upright task.
  • ...and 3 more figures