Table of Contents
Fetching ...

Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control

Fangyi Zhang, Jürgen Leitner, Michael Milford, Ben Upcroft, Peter Corke

TL;DR

The paper tackles learning vision-based robotic motion control from raw pixel inputs without prior kinematic knowledge, focusing on a 3-joint planar arm performing target reaching via a Deep Q-Network. It combines a custom 2D simulator, a DQN learner, and ROS interfaces to a Baxter robot, and evaluates through simulation and real-world tests. Real-world transfer succeeds only when using synthetic imagery that matches the simulator, highlighting a domain gap with camera images. The results underscore the viability of vision-only DRL for manipulation in simulation and emphasize the need for robust domain adaptation and reward design for real-world applicability.

Abstract

This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching with a three-joint robot manipulator using external visual observation. A Deep Q Network (DQN) was demonstrated to perform target reaching after training in simulation. Transferring the network to real hardware and real observation in a naive approach failed, but experiments show that the network works when replacing camera images with synthetic images.

Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control

TL;DR

The paper tackles learning vision-based robotic motion control from raw pixel inputs without prior kinematic knowledge, focusing on a 3-joint planar arm performing target reaching via a Deep Q-Network. It combines a custom 2D simulator, a DQN learner, and ROS interfaces to a Baxter robot, and evaluates through simulation and real-world tests. Real-world transfer succeeds only when using synthetic imagery that matches the simulator, highlighting a domain gap with camera images. The results underscore the viability of vision-only DRL for manipulation in simulation and emphasize the need for robust domain adaptation and reward design for real-world applicability.

Abstract

This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching with a three-joint robot manipulator using external visual observation. A Deep Q Network (DQN) was demonstrated to perform target reaching after training in simulation. Transferring the network to real hardware and real observation in a naive approach failed, but experiments show that the network works when replacing camera images with synthetic images.

Paper Structure

This paper contains 16 sections, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Baxter's arm being controlled by a trained deep Q Network (DQN). Synthetic images (on the right) are fed into the DQN to overcome some of the real-world issues encountered, i.e., the differences between training and testing settings.
  • Figure 2: Schematic of the DQN layers for end-to-end learning and their respective outputs. Four input images are reshaped (Rs) and then fed into the DQN network as grey-scale images (converted from RGB). The DQN, consists of three convolutional layers with rectifier layers (Rf) after each, followed by a reshaping layer (Rs) and two fully connected layers (again with a rectifier layer in between). The normalized outputs of each layer are visualized. (Note: The outputs of the last four layers are shown as matrices instead of vectors.)
  • Figure 3: System overview
  • Figure 4: The 2D target reaching simulator, providing visual inputs to the DQN learner. It was implemented from scratch, no simulation platform was used.
  • Figure 5: Screenshots highlighting the different training scenarios for the agents.
  • ...and 2 more figures