Learning Manipulation Tasks in Dynamic and Shared 3D Spaces

Hariharan Arunachalam; Marc Hanheide; Sariah Mghames

Learning Manipulation Tasks in Dynamic and Shared 3D Spaces

Hariharan Arunachalam, Marc Hanheide, Sariah Mghames

TL;DR

The paper tackles efficient automated segregation of multi-categorical items in a dynamic, shared 3D workspace by learning dual-robot collaboration for pick-and-place tasks. It proposes the MLDEnv framework, a two-stage pipeline that combines PointNet-based 3D perception with Proximal Policy Optimization (PPO) for policy learning, enabling two fixed-base manipulators to operate safely amid static and dynamic obstacles, including human coworkers. Perception builds obstacle embeddings with an LSTM to handle variable obstacle counts, while the policy optimizes joint-velocity commands under a reward that balances progress toward goals with safety, captured by $d_{eg}$, $d_i$, $D_{min}$, and the sphere $S$ around joints. Experiments in Gazebo with UR10 robots show high segmentation accuracy and successful learning of cooperative manipulation, with an emphasis on deployment considerations for online adaptation and safety; future work includes imitation learning and probabilistic obstacle pose estimation to improve real-world performance.

Abstract

Automating the segregation process is a need for every sector experiencing a high volume of materials handling, repetitive and exhaustive operations, in addition to risky exposures. Learning automated pick-and-place operations can be efficiently done by introducing collaborative autonomous systems (e.g. manipulators) in the workplace and among human operators. In this paper, we propose a deep reinforcement learning strategy to learn the place task of multi-categorical items from a shared workspace between dual-manipulators and to multi-goal destinations, assuming the pick has been already completed. The learning strategy leverages first a stochastic actor-critic framework to train an agent's policy network, and second, a dynamic 3D Gym environment where both static and dynamic obstacles (e.g. human factors and robot mate) constitute the state space of a Markov decision process. Learning is conducted in a Gazebo simulator and experiments show an increase in cumulative reward function for the agent further away from human factors. Future investigations will be conducted to enhance the task performance for both agents simultaneously.

Learning Manipulation Tasks in Dynamic and Shared 3D Spaces

TL;DR

, and the sphere

around joints. Experiments in Gazebo with UR10 robots show high segmentation accuracy and successful learning of cooperative manipulation, with an emphasis on deployment considerations for online adaptation and safety; future work includes imitation learning and probabilistic obstacle pose estimation to improve real-world performance.

Abstract

Paper Structure (12 sections, 2 equations, 7 figures)

This paper contains 12 sections, 2 equations, 7 figures.

Introduction
Approach
Problem Definition
Scenario
Perception
Policy learning
Reward Function
Policy
Experiments
PointNet Evaluation
Training a Reinforcement Learning Agent
Conclusion and Future Directions

Figures (7)

Figure 1: Gazebo simulation environment to train a reinforcement learning agent for the autonomous collaborative segregation task. In the environment, two UR10, bolts and nuts items on a shared work table, two destination boxes, a mobile agent, and a Kinect sensor, are loaded.
Figure 2: Segmentation results of a pointcloud from the trained pointnet.
Figure 3: MLDEnv pipeline
Figure 4: Sample data for PointNet training: box, human, table, robot.
Figure 5: Evaluation accuracy of PointNet on custom dataset
...and 2 more figures

Learning Manipulation Tasks in Dynamic and Shared 3D Spaces

TL;DR

Abstract

Learning Manipulation Tasks in Dynamic and Shared 3D Spaces

Authors

TL;DR

Abstract

Table of Contents

Figures (7)