Table of Contents
Fetching ...

Safe Reinforcement Learning of Robot Trajectories in the Presence of Moving Obstacles

Jonas Kiemel, Ludovic Righetti, Torsten Kröger, Tamim Asfour

TL;DR

An approach for learning collision-free robot trajectories in the presence of moving obstacles using model-free reinforcement learning, which can generate safe trajectories in real time and demonstrate the effectiveness of the approach for deterministic and stochastic environments.

Abstract

In this paper, we present an approach for learning collision-free robot trajectories in the presence of moving obstacles. As a first step, we train a backup policy to generate evasive movements from arbitrary initial robot states using model-free reinforcement learning. When learning policies for other tasks, the backup policy can be used to estimate the potential risk of a collision and to offer an alternative action if the estimated risk is considered too high. No matter which action is selected, our action space ensures that the kinematic limits of the robot joints are not violated. We analyze and evaluate two different methods for estimating the risk of a collision. A physics simulation performed in the background is computationally expensive but provides the best results in deterministic environments. If a data-based risk estimator is used instead, the computational effort is significantly reduced, but an additional source of error is introduced. For evaluation, we successfully learn a reaching task and a basketball task while keeping the risk of collisions low. The results demonstrate the effectiveness of our approach for deterministic and stochastic environments, including a human-robot scenario and a ball environment, where no state can be considered permanently safe. By conducting experiments with a real robot, we show that our approach can generate safe trajectories in real time.

Safe Reinforcement Learning of Robot Trajectories in the Presence of Moving Obstacles

TL;DR

An approach for learning collision-free robot trajectories in the presence of moving obstacles using model-free reinforcement learning, which can generate safe trajectories in real time and demonstrate the effectiveness of the approach for deterministic and stochastic environments.

Abstract

In this paper, we present an approach for learning collision-free robot trajectories in the presence of moving obstacles. As a first step, we train a backup policy to generate evasive movements from arbitrary initial robot states using model-free reinforcement learning. When learning policies for other tasks, the backup policy can be used to estimate the potential risk of a collision and to offer an alternative action if the estimated risk is considered too high. No matter which action is selected, our action space ensures that the kinematic limits of the robot joints are not violated. We analyze and evaluate two different methods for estimating the risk of a collision. A physics simulation performed in the background is computationally expensive but provides the best results in deterministic environments. If a data-based risk estimator is used instead, the computational effort is significantly reduced, but an additional source of error is introduced. For evaluation, we successfully learn a reaching task and a basketball task while keeping the risk of collisions low. The results demonstrate the effectiveness of our approach for deterministic and stochastic environments, including a human-robot scenario and a ball environment, where no state can be considered permanently safe. By conducting experiments with a real robot, we show that our approach can generate safe trajectories in real time.

Paper Structure

This paper contains 28 sections, 3 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Our approach shown for the Ball environment, where balls are thrown towards the robot from random directions.
  • Figure 2: Collision avoidance by ensuring the existence of a safe backup trajectory. See section \ref{['sec:basic_principle']} for details.
  • Figure 3: Potential failure causes when performing a background simulation to detect safety violations.
  • Figure 4: Action mapping (a) and distance reward (b).
  • Figure 5: The figure illustrates the action generation with the task network, four different ways (A, B1, B2a, and B2b) to estimate the corresponding risk, and the risk-dependent action adjustment using the backup network.
  • ...and 2 more figures