Table of Contents
Fetching ...

Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding

Laurin Luttmann, Lin Xie

TL;DR

This work tackles the min-max Mixed-Shelves Picker Routing Problem (MSPRP) in warehouse settings, where multiple pickers must fulfill demands while balancing workloads. It introduces MAHAM, a Multi-Agent Hierarchical Attention Model that combines a Problem Encoder, an Agent Context Encoder, and parallel hierarchical decoders to generate coordinated joint actions, with a Sequential Action Selection mechanism to avoid conflicts; training uses a self-improvement objective and a sparse reward $R(\bm{a},x) = - \max_{m \in \mathcal{M}} dist(\tau^m_{1:T})$. MAHAM achieves state-of-the-art solution quality and inference speed, especially on large-scale and out-of-distribution instances, and demonstrates competitive performance against exact solvers on small cases. The approach offers scalable, fast, and coordinated multi-agent optimization for real-world warehouse operations and paves the way for future integration with dynamic demand and hybrid optimization methods.

Abstract

The Mixed-Shelves Picker Routing Problem (MSPRP) is a fundamental challenge in warehouse logistics, where pickers must navigate a mixed-shelves environment to retrieve SKUs efficiently. Traditional heuristics and optimization-based approaches struggle with scalability, while recent machine learning methods often rely on sequential decision-making, leading to high solution latency and suboptimal agent coordination. In this work, we propose a novel hierarchical and parallel decoding approach for solving the min-max variant of the MSPRP via multi-agent reinforcement learning. While our approach generates a joint distribution over agent actions, allowing for fast decoding and effective picker coordination, our method introduces a sequential action selection to avoid conflicts in the multi-dimensional action space. Experiments show state-of-the-art performance in both solution quality and inference speed, particularly for large-scale and out-of-distribution instances. Our code is publicly available at http://github.com/LTluttmann/marl4msprp.

Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding

TL;DR

This work tackles the min-max Mixed-Shelves Picker Routing Problem (MSPRP) in warehouse settings, where multiple pickers must fulfill demands while balancing workloads. It introduces MAHAM, a Multi-Agent Hierarchical Attention Model that combines a Problem Encoder, an Agent Context Encoder, and parallel hierarchical decoders to generate coordinated joint actions, with a Sequential Action Selection mechanism to avoid conflicts; training uses a self-improvement objective and a sparse reward . MAHAM achieves state-of-the-art solution quality and inference speed, especially on large-scale and out-of-distribution instances, and demonstrates competitive performance against exact solvers on small cases. The approach offers scalable, fast, and coordinated multi-agent optimization for real-world warehouse operations and paves the way for future integration with dynamic demand and hybrid optimization methods.

Abstract

The Mixed-Shelves Picker Routing Problem (MSPRP) is a fundamental challenge in warehouse logistics, where pickers must navigate a mixed-shelves environment to retrieve SKUs efficiently. Traditional heuristics and optimization-based approaches struggle with scalability, while recent machine learning methods often rely on sequential decision-making, leading to high solution latency and suboptimal agent coordination. In this work, we propose a novel hierarchical and parallel decoding approach for solving the min-max variant of the MSPRP via multi-agent reinforcement learning. While our approach generates a joint distribution over agent actions, allowing for fast decoding and effective picker coordination, our method introduces a sequential action selection to avoid conflicts in the multi-dimensional action space. Experiments show state-of-the-art performance in both solution quality and inference speed, particularly for large-scale and out-of-distribution instances. Our code is publicly available at http://github.com/LTluttmann/marl4msprp.

Paper Structure

This paper contains 37 sections, 12 equations, 3 figures, 5 tables, 2 algorithms.

Figures (3)

  • Figure 1: Overview of the MAHAM Architecture
  • Figure 2: Agent Context Encoder
  • Figure 3: Solution quality of MAHAM on MSPRP instances for different ranking strategies (left) and MAHAM efficiency in comparison to the 2d-Ptr and PARCO (right)