Table of Contents
Fetching ...

Learning Manipulation Tasks in Dynamic and Shared 3D Spaces

Hariharan Arunachalam, Marc Hanheide, Sariah Mghames

TL;DR

The paper tackles efficient automated segregation of multi-categorical items in a dynamic, shared 3D workspace by learning dual-robot collaboration for pick-and-place tasks. It proposes the MLDEnv framework, a two-stage pipeline that combines PointNet-based 3D perception with Proximal Policy Optimization (PPO) for policy learning, enabling two fixed-base manipulators to operate safely amid static and dynamic obstacles, including human coworkers. Perception builds obstacle embeddings with an LSTM to handle variable obstacle counts, while the policy optimizes joint-velocity commands under a reward that balances progress toward goals with safety, captured by $d_{eg}$, $d_i$, $D_{min}$, and the sphere $S$ around joints. Experiments in Gazebo with UR10 robots show high segmentation accuracy and successful learning of cooperative manipulation, with an emphasis on deployment considerations for online adaptation and safety; future work includes imitation learning and probabilistic obstacle pose estimation to improve real-world performance.

Abstract

Automating the segregation process is a need for every sector experiencing a high volume of materials handling, repetitive and exhaustive operations, in addition to risky exposures. Learning automated pick-and-place operations can be efficiently done by introducing collaborative autonomous systems (e.g. manipulators) in the workplace and among human operators. In this paper, we propose a deep reinforcement learning strategy to learn the place task of multi-categorical items from a shared workspace between dual-manipulators and to multi-goal destinations, assuming the pick has been already completed. The learning strategy leverages first a stochastic actor-critic framework to train an agent's policy network, and second, a dynamic 3D Gym environment where both static and dynamic obstacles (e.g. human factors and robot mate) constitute the state space of a Markov decision process. Learning is conducted in a Gazebo simulator and experiments show an increase in cumulative reward function for the agent further away from human factors. Future investigations will be conducted to enhance the task performance for both agents simultaneously.

Learning Manipulation Tasks in Dynamic and Shared 3D Spaces

TL;DR

The paper tackles efficient automated segregation of multi-categorical items in a dynamic, shared 3D workspace by learning dual-robot collaboration for pick-and-place tasks. It proposes the MLDEnv framework, a two-stage pipeline that combines PointNet-based 3D perception with Proximal Policy Optimization (PPO) for policy learning, enabling two fixed-base manipulators to operate safely amid static and dynamic obstacles, including human coworkers. Perception builds obstacle embeddings with an LSTM to handle variable obstacle counts, while the policy optimizes joint-velocity commands under a reward that balances progress toward goals with safety, captured by , , , and the sphere around joints. Experiments in Gazebo with UR10 robots show high segmentation accuracy and successful learning of cooperative manipulation, with an emphasis on deployment considerations for online adaptation and safety; future work includes imitation learning and probabilistic obstacle pose estimation to improve real-world performance.

Abstract

Automating the segregation process is a need for every sector experiencing a high volume of materials handling, repetitive and exhaustive operations, in addition to risky exposures. Learning automated pick-and-place operations can be efficiently done by introducing collaborative autonomous systems (e.g. manipulators) in the workplace and among human operators. In this paper, we propose a deep reinforcement learning strategy to learn the place task of multi-categorical items from a shared workspace between dual-manipulators and to multi-goal destinations, assuming the pick has been already completed. The learning strategy leverages first a stochastic actor-critic framework to train an agent's policy network, and second, a dynamic 3D Gym environment where both static and dynamic obstacles (e.g. human factors and robot mate) constitute the state space of a Markov decision process. Learning is conducted in a Gazebo simulator and experiments show an increase in cumulative reward function for the agent further away from human factors. Future investigations will be conducted to enhance the task performance for both agents simultaneously.
Paper Structure (12 sections, 2 equations, 7 figures)

This paper contains 12 sections, 2 equations, 7 figures.

Figures (7)

  • Figure 1: Gazebo simulation environment to train a reinforcement learning agent for the autonomous collaborative segregation task. In the environment, two UR10, bolts and nuts items on a shared work table, two destination boxes, a mobile agent, and a Kinect sensor, are loaded.
  • Figure 2: Segmentation results of a pointcloud from the trained pointnet.
  • Figure 3: MLDEnv pipeline
  • Figure 4: Sample data for PointNet training: box, human, table, robot.
  • Figure 5: Evaluation accuracy of PointNet on custom dataset
  • ...and 2 more figures