Table of Contents
Fetching ...

Reinforcement Learning of Multi-robot Task Allocation for Multi-object Transportation with Infeasible Tasks

Yuma Shida, Tomohiko Jimbo, Tadashi Odashima, Takamitsu Matsubara

TL;DR

This paper tackles multi-object transportation with unknown object weights and infeasible tasks by introducing a cloud-based framework that broadcasts scalable task experiences $E_l(t)$ to all robots. Each robot learns per-task exclusion levels $\zeta_i^l$ and dynamic priorities $\phi_i^l$, which are integrated through an output gate to produce feasible task choices while temporarily excluding infeasible tasks. The method uses consensus-based updates and a MADDPG-trained policy to coordinate both individual and cooperative transport across varying numbers of robots $N$ and objects $M$, including scenarios with unlearned weights. Validation shows improved success rates and reduced transportation times compared to dynamic-priority baselines, demonstrating scalability, versatility, and effective deadlock avoidance in dynamic MRTA settings.

Abstract

Multi-object transport using multi-robot systems has the potential for diverse practical applications such as delivery services owing to its efficient individual and scalable cooperative transport. However, allocating transportation tasks of objects with unknown weights remains challenging. Moreover, the presence of infeasible tasks (untransportable objects) can lead to robot stoppage (deadlock). This paper proposes a framework for dynamic task allocation that involves storing task experiences for each task in a scalable manner with respect to the number of robots. First, these experiences are broadcasted from the cloud server to the entire robot system. Subsequently, each robot learns the exclusion levels for each task based on those task experiences, enabling it to exclude infeasible tasks and reset its task priorities. Finally, individual transportation, cooperative transportation, and the temporary exclusion of tasks considered infeasible are achieved. The scalability and versatility of the proposed method were confirmed through numerical experiments with an increased number of robots and objects, including unlearned weight objects. The effectiveness of the temporary deadlock avoidance was also confirmed by introducing additional robots within an episode. The proposed method enables the implementation of task allocation strategies that are feasible for different numbers of robots and various transport tasks without prior consideration of feasibility.

Reinforcement Learning of Multi-robot Task Allocation for Multi-object Transportation with Infeasible Tasks

TL;DR

This paper tackles multi-object transportation with unknown object weights and infeasible tasks by introducing a cloud-based framework that broadcasts scalable task experiences to all robots. Each robot learns per-task exclusion levels and dynamic priorities , which are integrated through an output gate to produce feasible task choices while temporarily excluding infeasible tasks. The method uses consensus-based updates and a MADDPG-trained policy to coordinate both individual and cooperative transport across varying numbers of robots and objects , including scenarios with unlearned weights. Validation shows improved success rates and reduced transportation times compared to dynamic-priority baselines, demonstrating scalability, versatility, and effective deadlock avoidance in dynamic MRTA settings.

Abstract

Multi-object transport using multi-robot systems has the potential for diverse practical applications such as delivery services owing to its efficient individual and scalable cooperative transport. However, allocating transportation tasks of objects with unknown weights remains challenging. Moreover, the presence of infeasible tasks (untransportable objects) can lead to robot stoppage (deadlock). This paper proposes a framework for dynamic task allocation that involves storing task experiences for each task in a scalable manner with respect to the number of robots. First, these experiences are broadcasted from the cloud server to the entire robot system. Subsequently, each robot learns the exclusion levels for each task based on those task experiences, enabling it to exclude infeasible tasks and reset its task priorities. Finally, individual transportation, cooperative transportation, and the temporary exclusion of tasks considered infeasible are achieved. The scalability and versatility of the proposed method were confirmed through numerical experiments with an increased number of robots and objects, including unlearned weight objects. The effectiveness of the temporary deadlock avoidance was also confirmed by introducing additional robots within an episode. The proposed method enables the implementation of task allocation strategies that are feasible for different numbers of robots and various transport tasks without prior consideration of feasibility.
Paper Structure (18 sections, 16 equations, 9 figures, 2 tables)

This paper contains 18 sections, 16 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Multi-object transport using a multi-robot system. (a) Robots independently perform actions when they can carry the selected objects. (b) Deadlock occurs when robots are unable to cooperatively carry the selected object. (c) Robots employ deadlock avoidance strategies and cooperate with each other to carry heavy objects. (d) After the introduction of three additional robots into the system, the robots cooperatively carry the object that was previously unable to be carried.
  • Figure 2: Abstracted framework of the proposed dynamic task allocation, which includes handling infeasible tasks. Task experiences $E$ that are scalable with the number of robots $N$, are broadcasted from the cloud server to each robot in order to learn task exclusion levels $\zeta_i$ and task priorities $\phi_i$. Subsequently, robots shares their information between all robots when cooperating is necessary.
  • Figure 3: Block diagram of multi-robot task allocation with infeasible tasks. The output gate plays a role in integrating dynamic task exclusion and dynamic task priority.
  • Figure 4: Numerical environments with $N=6$ robots and $M=10$ objects. Each square represents a robot, each colored circle represents an object, and each colored dot represents a goal of transport. The number inside each circle represents the number of robots needed to transport the object, and the number of each square represents the robot's ID.
  • Figure 5: Cumulative rewards of our framework using DP: Dynamic Priority and DE: Dynamic Exclusion, and $E$: task experience of objects.
  • ...and 4 more figures