Learning Multi-agent Multi-machine Tending by Mobile Robots
Abdalwhab Abdalwhab, Giovanni Beltrame, Samira Ebrahimi Kahou, David St-Onge
TL;DR
This work addresses the challenge of decentralized, multi-robot machine tending in manufacturing by introducing AB-MAPPO, an attention-augmented MAPPO framework tailored for multi-agent, multi-machine coordination. It enhances the MAPPO backbone with a dense, attention-based critic that aggregates spatial-temporal information across agents, improving value estimation and policy learning. Through carefully designed observations and a composite reward structure, AB-MAPPO achieves superior task success, safety, and resource utilization compared with MAPPO, and demonstrates robustness across various layouts via extensive ablations. The results suggest practical potential for deploying mobile, cooperative robots in production environments and point to future work on more dynamic material handling and realistic manipulation tasks.
Abstract
Robotics can help address the growing worker shortage challenge of the manufacturing industry. As such, machine tending is a task collaborative robots can tackle that can also highly boost productivity. Nevertheless, existing robotics systems deployed in that sector rely on a fixed single-arm setup, whereas mobile robots can provide more flexibility and scalability. In this work, we introduce a multi-agent multi-machine tending learning framework by mobile robots based on Multi-agent Reinforcement Learning (MARL) techniques with the design of a suitable observation and reward. Moreover, an attention-based encoding mechanism is developed and integrated into Multi-agent Proximal Policy Optimization (MAPPO) algorithm to boost its performance for machine tending scenarios. Our model (AB-MAPPO) outperformed MAPPO in this new challenging scenario in terms of task success, safety, and resources utilization. Furthermore, we provided an extensive ablation study to support our various design decisions.
