Reinforcement Learning for Solving Robotic Reaching Tasks in the Neurorobotics Platform

Marton Szep; Leander Lauenburg; Kevin Farkas; Xiyan Su; Chuanlong Zang

Reinforcement Learning for Solving Robotic Reaching Tasks in the Neurorobotics Platform

Marton Szep, Leander Lauenburg, Kevin Farkas, Xiyan Su, Chuanlong Zang

TL;DR

The paper tackles robotic reaching with reinforcement learning on the Neurorobotics Platform, addressing safety and data-efficiency by comparing model-free agents (DDPG, TD3, SAC) under curriculum learning. It demonstrates that curriculum-guided TD3 with dense rewards and HER, especially in a four-joint configuration, achieves high precision (≈2.4 cm) and high success rates (≈92% at 5 cm threshold), while six joints incur higher difficulty. The study also analyzes learning from image data, showing that top-down 2D localization aids learning but manual or CNN-based ground-truth extraction remains imperfect, and autoencoder latent representations underperform. Overall, the work provides a comprehensive assessment of model-free RL for a reaching task in neurorobotics, highlights practical design choices (reward shaping, HER, action-space constraints, curriculum), and points to future improvements in vision-based control and representation learning.

Abstract

In recent years, reinforcement learning (RL) has shown great potential for solving tasks in well-defined environments like games or robotics. This paper aims to solve the robotic reaching task in a simulation run on the Neurorobotics Platform (NRP). The target position is initialized randomly and the robot has 6 degrees of freedom. We compare the performance of various state-of-the-art model-free algorithms. At first, the agent is trained on ground truth data from the simulation to reach the target position in only one continuous movement. Later the complexity of the task is increased by using image data as input from the simulation environment. Experimental results show that training efficiency and results can be improved with appropriate dynamic training schedule function for curriculum learning.

Reinforcement Learning for Solving Robotic Reaching Tasks in the Neurorobotics Platform

TL;DR

Abstract

Paper Structure (19 sections, 4 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 4 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Background and Methodology
Reinforcement Learning
Algorithms
Deep Deterministic Policy Gradient (DDPG)
Twin Delayed DDPG (TD3)
Soft Actor Critic (SAC)
Reaching Task
Experiments
Environmental Setup
Learning from ground truth data
Custom implementations
Stable Baselines supported implementations
Learning from images
...and 4 more sections

Figures (7)

Figure 1: Reaching episode using the HoLLiE robot arm. The blue cylinder represents the target. The position of the cylinder on the table is initialised randomly in each episode. (a) The initial pose of the arm. (b) Robot posture after the agent sets the joint angles such that the distance of the positions between the cylinder and the end effector is below a required threshold. In this case, the episode is considered to be successful.
Figure 2: Arrangement of the core components of the designed setup. The top and left 3d-boxes are representing the frontend and backend of the dockerized NRP framework. The right 3d-box represents the docker container containing RL algorithms. Both containers are connected via a communication channel using gRPC. Control signals to the robot are carried to the underlying simulation via a python interface (experiment_api) which decodes them to ROS commands.
Figure 3: Three DDPG models were trained and evaluated for thresholds of 0.15 m, 0.10 m, and 0.05 m. A huge drop in performance can be observed, comparing the models trained for 0.10 m and 0.05 m. This behavior is particularly present in DDPG, that’s why its extension TD3 tries to tackle that issue.
Figure 4: Best performing TD3 and SAC models. The points in the scatter plots are the target positions, and they are colored based on the achieved proximity of the end effector. The empty semicircle on the top of the plots is where the robot's base is mounted to the table. A target position in this area would result in the collision of the robot with itself. The TD3 model was trained with dense rewards and HER, while the SAC model was trained with sparse rewards.
Figure 5: Manual extraction of the cylinder location. (a) Masked and thresholded images. A red circle marks the remaining white blob. (b) Projection of the red circle into the original images. The images in (a) and (b) do not correspond.
...and 2 more figures

Reinforcement Learning for Solving Robotic Reaching Tasks in the Neurorobotics Platform

TL;DR

Abstract

Reinforcement Learning for Solving Robotic Reaching Tasks in the Neurorobotics Platform

Authors

TL;DR

Abstract

Table of Contents

Figures (7)