Learning Multi-agent Multi-machine Tending by Mobile Robots

Abdalwhab Abdalwhab; Giovanni Beltrame; Samira Ebrahimi Kahou; David St-Onge

Learning Multi-agent Multi-machine Tending by Mobile Robots

Abdalwhab Abdalwhab, Giovanni Beltrame, Samira Ebrahimi Kahou, David St-Onge

TL;DR

This work addresses the challenge of decentralized, multi-robot machine tending in manufacturing by introducing AB-MAPPO, an attention-augmented MAPPO framework tailored for multi-agent, multi-machine coordination. It enhances the MAPPO backbone with a dense, attention-based critic that aggregates spatial-temporal information across agents, improving value estimation and policy learning. Through carefully designed observations and a composite reward structure, AB-MAPPO achieves superior task success, safety, and resource utilization compared with MAPPO, and demonstrates robustness across various layouts via extensive ablations. The results suggest practical potential for deploying mobile, cooperative robots in production environments and point to future work on more dynamic material handling and realistic manipulation tasks.

Abstract

Robotics can help address the growing worker shortage challenge of the manufacturing industry. As such, machine tending is a task collaborative robots can tackle that can also highly boost productivity. Nevertheless, existing robotics systems deployed in that sector rely on a fixed single-arm setup, whereas mobile robots can provide more flexibility and scalability. In this work, we introduce a multi-agent multi-machine tending learning framework by mobile robots based on Multi-agent Reinforcement Learning (MARL) techniques with the design of a suitable observation and reward. Moreover, an attention-based encoding mechanism is developed and integrated into Multi-agent Proximal Policy Optimization (MAPPO) algorithm to boost its performance for machine tending scenarios. Our model (AB-MAPPO) outperformed MAPPO in this new challenging scenario in terms of task success, safety, and resources utilization. Furthermore, we provided an extensive ablation study to support our various design decisions.

Learning Multi-agent Multi-machine Tending by Mobile Robots

TL;DR

Abstract

Paper Structure (18 sections, 5 equations, 5 figures, 4 tables)

This paper contains 18 sections, 5 equations, 5 figures, 4 tables.

Introduction
Background and Related Work
Machine Tending
Reinforcement Learning
Problem Definition
Methodology
MAPPO backbone
Novel Attention-based Encoding for MAPPO
Observation Design
Reward Design
Experiments
Simulation Setup
Evaluation Procedure
Results and Adaptability
Ablation Study
...and 3 more sections

Figures (5)

Figure 1: Our Attention-based encoding for the critic
Figure 2: Multi-agent (red) multi-machine (green) tending scenario designed in VMAS, including obstacles (gray) and storage area (blue), with dotted lines pointing to the actual robot (our RanGen robot composed of a Kinova Gen3 arm on top of an AgileX Ranger Mini mobile base), machines (CNC Universal Milling Machine DMU 50) and storage shelves that we are planning to use for real deployment.
Figure 3: The total episode return for AB-MAPPO compared to MAPPO
Figure 4: Examples of environment layouts with good performance
Figure 5: Examples of environment layouts with less optimal performance: agents learn to tend for one machine and ignore the other.

Learning Multi-agent Multi-machine Tending by Mobile Robots

TL;DR

Abstract

Learning Multi-agent Multi-machine Tending by Mobile Robots

Authors

TL;DR

Abstract

Table of Contents

Figures (5)