Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control
Fangyi Zhang, Jürgen Leitner, Michael Milford, Ben Upcroft, Peter Corke
TL;DR
The paper tackles learning vision-based robotic motion control from raw pixel inputs without prior kinematic knowledge, focusing on a 3-joint planar arm performing target reaching via a Deep Q-Network. It combines a custom 2D simulator, a DQN learner, and ROS interfaces to a Baxter robot, and evaluates through simulation and real-world tests. Real-world transfer succeeds only when using synthetic imagery that matches the simulator, highlighting a domain gap with camera images. The results underscore the viability of vision-only DRL for manipulation in simulation and emphasize the need for robust domain adaptation and reward design for real-world applicability.
Abstract
This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching with a three-joint robot manipulator using external visual observation. A Deep Q Network (DQN) was demonstrated to perform target reaching after training in simulation. Transferring the network to real hardware and real observation in a naive approach failed, but experiments show that the network works when replacing camera images with synthetic images.
