Table of Contents
Fetching ...

Learning Multi-agent Multi-machine Tending by Mobile Robots

Abdalwhab Abdalwhab, Giovanni Beltrame, Samira Ebrahimi Kahou, David St-Onge

TL;DR

This work addresses the challenge of decentralized, multi-robot machine tending in manufacturing by introducing AB-MAPPO, an attention-augmented MAPPO framework tailored for multi-agent, multi-machine coordination. It enhances the MAPPO backbone with a dense, attention-based critic that aggregates spatial-temporal information across agents, improving value estimation and policy learning. Through carefully designed observations and a composite reward structure, AB-MAPPO achieves superior task success, safety, and resource utilization compared with MAPPO, and demonstrates robustness across various layouts via extensive ablations. The results suggest practical potential for deploying mobile, cooperative robots in production environments and point to future work on more dynamic material handling and realistic manipulation tasks.

Abstract

Robotics can help address the growing worker shortage challenge of the manufacturing industry. As such, machine tending is a task collaborative robots can tackle that can also highly boost productivity. Nevertheless, existing robotics systems deployed in that sector rely on a fixed single-arm setup, whereas mobile robots can provide more flexibility and scalability. In this work, we introduce a multi-agent multi-machine tending learning framework by mobile robots based on Multi-agent Reinforcement Learning (MARL) techniques with the design of a suitable observation and reward. Moreover, an attention-based encoding mechanism is developed and integrated into Multi-agent Proximal Policy Optimization (MAPPO) algorithm to boost its performance for machine tending scenarios. Our model (AB-MAPPO) outperformed MAPPO in this new challenging scenario in terms of task success, safety, and resources utilization. Furthermore, we provided an extensive ablation study to support our various design decisions.

Learning Multi-agent Multi-machine Tending by Mobile Robots

TL;DR

This work addresses the challenge of decentralized, multi-robot machine tending in manufacturing by introducing AB-MAPPO, an attention-augmented MAPPO framework tailored for multi-agent, multi-machine coordination. It enhances the MAPPO backbone with a dense, attention-based critic that aggregates spatial-temporal information across agents, improving value estimation and policy learning. Through carefully designed observations and a composite reward structure, AB-MAPPO achieves superior task success, safety, and resource utilization compared with MAPPO, and demonstrates robustness across various layouts via extensive ablations. The results suggest practical potential for deploying mobile, cooperative robots in production environments and point to future work on more dynamic material handling and realistic manipulation tasks.

Abstract

Robotics can help address the growing worker shortage challenge of the manufacturing industry. As such, machine tending is a task collaborative robots can tackle that can also highly boost productivity. Nevertheless, existing robotics systems deployed in that sector rely on a fixed single-arm setup, whereas mobile robots can provide more flexibility and scalability. In this work, we introduce a multi-agent multi-machine tending learning framework by mobile robots based on Multi-agent Reinforcement Learning (MARL) techniques with the design of a suitable observation and reward. Moreover, an attention-based encoding mechanism is developed and integrated into Multi-agent Proximal Policy Optimization (MAPPO) algorithm to boost its performance for machine tending scenarios. Our model (AB-MAPPO) outperformed MAPPO in this new challenging scenario in terms of task success, safety, and resources utilization. Furthermore, we provided an extensive ablation study to support our various design decisions.
Paper Structure (18 sections, 5 equations, 5 figures, 4 tables)

This paper contains 18 sections, 5 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Our Attention-based encoding for the critic
  • Figure 2: Multi-agent (red) multi-machine (green) tending scenario designed in VMAS, including obstacles (gray) and storage area (blue), with dotted lines pointing to the actual robot (our RanGen robot composed of a Kinova Gen3 arm on top of an AgileX Ranger Mini mobile base), machines (CNC Universal Milling Machine DMU 50) and storage shelves that we are planning to use for real deployment.
  • Figure 3: The total episode return for AB-MAPPO compared to MAPPO
  • Figure 4: Examples of environment layouts with good performance
  • Figure 5: Examples of environment layouts with less optimal performance: agents learn to tend for one machine and ignore the other.