Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding
Laurin Luttmann, Lin Xie
TL;DR
This work tackles the min-max Mixed-Shelves Picker Routing Problem (MSPRP) in warehouse settings, where multiple pickers must fulfill demands while balancing workloads. It introduces MAHAM, a Multi-Agent Hierarchical Attention Model that combines a Problem Encoder, an Agent Context Encoder, and parallel hierarchical decoders to generate coordinated joint actions, with a Sequential Action Selection mechanism to avoid conflicts; training uses a self-improvement objective and a sparse reward $R(\bm{a},x) = - \max_{m \in \mathcal{M}} dist(\tau^m_{1:T})$. MAHAM achieves state-of-the-art solution quality and inference speed, especially on large-scale and out-of-distribution instances, and demonstrates competitive performance against exact solvers on small cases. The approach offers scalable, fast, and coordinated multi-agent optimization for real-world warehouse operations and paves the way for future integration with dynamic demand and hybrid optimization methods.
Abstract
The Mixed-Shelves Picker Routing Problem (MSPRP) is a fundamental challenge in warehouse logistics, where pickers must navigate a mixed-shelves environment to retrieve SKUs efficiently. Traditional heuristics and optimization-based approaches struggle with scalability, while recent machine learning methods often rely on sequential decision-making, leading to high solution latency and suboptimal agent coordination. In this work, we propose a novel hierarchical and parallel decoding approach for solving the min-max variant of the MSPRP via multi-agent reinforcement learning. While our approach generates a joint distribution over agent actions, allowing for fast decoding and effective picker coordination, our method introduces a sequential action selection to avoid conflicts in the multi-dimensional action space. Experiments show state-of-the-art performance in both solution quality and inference speed, particularly for large-scale and out-of-distribution instances. Our code is publicly available at http://github.com/LTluttmann/marl4msprp.
