Table of Contents
Fetching ...

Human-Robot Gym: Benchmarking Reinforcement Learning in Human-Robot Collaboration

Jakob Thumm, Felix Trost, Matthias Althoff

TL;DR

The diverse nature of the tasks offered by human-robot gym creates a challenging benchmark for state-of-the-art RL methods, and by leveraging expert knowledge in form of an action imitation reward, the RL agent can outperform the expert and overfit to training data.

Abstract

Deep reinforcement learning (RL) has shown promising results in robot motion planning with first attempts in human-robot collaboration (HRC). However, a fair comparison of RL approaches in HRC under the constraint of guaranteed safety is yet to be made. We, therefore, present human-robot gym, a benchmark suite for safe RL in HRC. Our benchmark suite provides eight challenging, realistic HRC tasks in a modular simulation framework. Most importantly, human-robot gym includes a safety shield that provably guarantees human safety. We are, thereby, the first to provide a benchmark suite to train RL agents that adhere to the safety specifications of real-world HRC. This bridges a critical gap between theoretic RL research and its real-world deployment. Our evaluation of six tasks led to three key results: (a) the diverse nature of the tasks offered by human-robot gym creates a challenging benchmark for state-of-the-art RL methods, (b) incorporating expert knowledge in RL training in the form of an action-based reward can outperform the expert, and (c) our agents negligibly overfit to training data.

Human-Robot Gym: Benchmarking Reinforcement Learning in Human-Robot Collaboration

TL;DR

The diverse nature of the tasks offered by human-robot gym creates a challenging benchmark for state-of-the-art RL methods, and by leveraging expert knowledge in form of an action imitation reward, the RL agent can outperform the expert and overfit to training data.

Abstract

Deep reinforcement learning (RL) has shown promising results in robot motion planning with first attempts in human-robot collaboration (HRC). However, a fair comparison of RL approaches in HRC under the constraint of guaranteed safety is yet to be made. We, therefore, present human-robot gym, a benchmark suite for safe RL in HRC. Our benchmark suite provides eight challenging, realistic HRC tasks in a modular simulation framework. Most importantly, human-robot gym includes a safety shield that provably guarantees human safety. We are, thereby, the first to provide a benchmark suite to train RL agents that adhere to the safety specifications of real-world HRC. This bridges a critical gap between theoretic RL research and its real-world deployment. Our evaluation of six tasks led to three key results: (a) the diverse nature of the tasks offered by human-robot gym creates a challenging benchmark for state-of-the-art RL methods, (b) incorporating expert knowledge in RL training in the form of an action-based reward can outperform the expert, and (c) our agents negligibly overfit to training data.
Paper Structure (17 sections, 2 equations, 4 figures, 2 tables)

This paper contains 17 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Human-robot gym presents eight challenging hrc tasks.
  • Figure 2: A typical workflow of an rl cycle in human-robot gym. Optional elements are depicted with dashed borders, and the inner loop of the environment step is executed $L$ times, e.g., $L=25$. In this example, the agent returns an action in Cartesian space corresponding to a desired end effector position, which is converted to a desired joint position using inverse kinematics. Our collision prevention alters the action if the desired joint position results in a self-collision or a collision with the static environment. The shield calculates the next safe joint positions, which the joint position controller converts into joint torques that are then executed in simulation.
  • Figure 3: Evaluation performance during training of rl agents on human-robot gym. The plots show the mean evaluation performance during training and the 95% confidence interval in the mean metric obtained with bootstrapping when training on five random seeds.
  • Figure 4: Ablation study for overfitting to motion data in the training process.