Multi-Agent Reinforcement Learning for Deadlock Handling among Autonomous Mobile Robots

Marcel Müller

Multi-Agent Reinforcement Learning for Deadlock Handling among Autonomous Mobile Robots

Marcel Müller

TL;DR

This work tackles deadlocks in intralogistics systems powered by AMRs by proposing a structured MARL-based framework that integrates deadlock handling into logistics planning. It introduces reference, deadlock-prone MAPF models and evaluates MARL algorithms (PPO, IMPALA) under centralized training with decentralized execution (CTDE), comparing them to rule-based MAPF methods. Key findings show that MARL with CTDE outperforms rule-based strategies in complex, congested environments, while simple layouts may still favor traditional approaches due to lower compute. The contributions include a methodology for RL deployment in planning, a set of standardized reference MAPF models, and a comprehensive evaluation comparing MARL to MA-A*, CBS, and traditional baselines across grid-based and external simulation environments, with implications for resilience-oriented logistics design and experimentation. The results underscore MARL’s potential to enhance throughput and robustness in dynamic intralogistics, while also outlining limitations (scalability, computational demands) and avenues for future work such as safe RL, hierarchical controls, and broader domain generalization.

Abstract

This dissertation explores the application of multi-agent reinforcement learning (MARL) for handling deadlocks in intralogistics systems that rely on autonomous mobile robots (AMRs). AMRs enhance operational flexibility but also increase the risk of deadlocks, which degrade system throughput and reliability. Existing approaches often neglect deadlock handling in the planning phase and rely on rigid control rules that cannot adapt to dynamic operational conditions. To address these shortcomings, this work develops a structured methodology for integrating MARL into logistics planning and operational control. It introduces reference models that explicitly consider deadlock-capable multi-agent pathfinding (MAPF) problems, enabling systematic evaluation of MARL strategies. Using grid-based environments and an external simulation software, the study compares traditional deadlock handling strategies with MARL-based solutions, focusing on PPO and IMPALA algorithms under different training and execution modes. Findings reveal that MARL-based strategies, particularly when combined with centralized training and decentralized execution (CTDE), outperform rule-based methods in complex, congested environments. In simpler environments or those with ample spatial freedom, rule-based methods remain competitive due to their lower computational demands. These results highlight that MARL provides a flexible and scalable solution for deadlock handling in dynamic intralogistics scenarios, but requires careful tailoring to the operational context.

Multi-Agent Reinforcement Learning for Deadlock Handling among Autonomous Mobile Robots

TL;DR

Abstract

Multi-Agent Reinforcement Learning for Deadlock Handling among Autonomous Mobile Robots

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)

Theorems & Definitions (1)