Table of Contents
Fetching ...

An Efficient Multi-Robot Arm Coordination Strategy for Pick-and-Place Tasks using Reinforcement Learning

Tizian Jermann, Hendrik Kolvenbach, Fidel Esquivel Estay, Koen Kramer, Marco Hutter

TL;DR

The paper tackles multi-robot waste sorting on a conveyor by learning a reinforcement-learning policy to allocate pick-and-place tasks, formulated in a custom OpenAI Gym environment and trained with Proximal Policy Optimization. It compares the RL approach to a combinatorial game theory baseline across diverse pattern distributions, demonstrating up to 16% higher picking rates and robust generalization to unseen scenarios. The work also validates the approach on a two-robot hardware setup, illustrating sim-to-real transfer and practical throughput improvements, with discussions on belt-speed implications and scalability to more agents. Overall, the study shows RL can flexibly and effectively optimize multi-robot coordination for high-throughput pick-and-place tasks in industrial-like sorting systems.

Abstract

We introduce a novel strategy for multi-robot sorting of waste objects using Reinforcement Learning. Our focus lies on finding optimal picking strategies that facilitate an effective coordination of a multi-robot system, subject to maximizing the waste removal potential. We realize this by formulating the sorting problem as an OpenAI gym environment and training a neural network with a deep reinforcement learning algorithm. The objective function is set up to optimize the picking rate of the robotic system. In simulation, we draw a performance comparison to an intuitive combinatorial game theory-based approach. We show that the trained policies outperform the latter and achieve up to 16% higher picking rates. Finally, the respective algorithms are validated on a hardware setup consisting of a two-robot sorting station able to process incoming waste objects through pick-and-place operations.

An Efficient Multi-Robot Arm Coordination Strategy for Pick-and-Place Tasks using Reinforcement Learning

TL;DR

The paper tackles multi-robot waste sorting on a conveyor by learning a reinforcement-learning policy to allocate pick-and-place tasks, formulated in a custom OpenAI Gym environment and trained with Proximal Policy Optimization. It compares the RL approach to a combinatorial game theory baseline across diverse pattern distributions, demonstrating up to 16% higher picking rates and robust generalization to unseen scenarios. The work also validates the approach on a two-robot hardware setup, illustrating sim-to-real transfer and practical throughput improvements, with discussions on belt-speed implications and scalability to more agents. Overall, the study shows RL can flexibly and effectively optimize multi-robot coordination for high-throughput pick-and-place tasks in industrial-like sorting systems.

Abstract

We introduce a novel strategy for multi-robot sorting of waste objects using Reinforcement Learning. Our focus lies on finding optimal picking strategies that facilitate an effective coordination of a multi-robot system, subject to maximizing the waste removal potential. We realize this by formulating the sorting problem as an OpenAI gym environment and training a neural network with a deep reinforcement learning algorithm. The objective function is set up to optimize the picking rate of the robotic system. In simulation, we draw a performance comparison to an intuitive combinatorial game theory-based approach. We show that the trained policies outperform the latter and achieve up to 16% higher picking rates. Finally, the respective algorithms are validated on a hardware setup consisting of a two-robot sorting station able to process incoming waste objects through pick-and-place operations.
Paper Structure (21 sections, 2 equations, 5 figures, 4 tables)

This paper contains 21 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Artist impression of collaborative, multiple robotic arm pick-and-place setup for waste sorting, governed by a strategy to maximize picking efficiency.
  • Figure 2: Pipeline on the real system consisting of waste transfer; vision pipeline to detect, track and grasp waste objects; the picking strategy assigning PnP tasks to robot agents and a motion planning and robot control module for executing the desired robot motions.
  • Figure 3: Overview of the two different pattern modelling methods. Waste objects are indicated in blue and the robot position and range are shown in orange.
  • Figure 4: Overview of the general ROS architecture of the network setup.
  • Figure 5: Final, two-robot sorting station handling incoming waste objects. One cycle of a single robot agent is achieved as follows: the robot is waiting in its resting position for the next object to pick. After the picking strategy decision has been made, the arm is on its way to the meeting point, where the grasping maneuver is executed. After successful pickup the object is transported back to the starting position where it is dropped off.