Deep Reinforcement Learning for Picker Routing Problem in Warehousing
George Dunn, Hadi Charkhgard, Ali Eshragh, Sasan Mahmoudinazlou, Elizabeth Stojanovski
TL;DR
Addresses efficient picker routing in rectangular warehouses by formulating a Tour-Graph Markov Decision Process and solving it with an attention-based neural network trained via reinforcement learning. The model sequentially adds vertical and horizontal tour edges, using a Transformer encoder and masked outputs to produce feasible tours and optionally simplify routes for human operators. Key contributions include the MDP formulation, the encoder-based architecture for aisle-level inputs, and a training regimen with RL to outperform common heuristics across diverse warehouse sizes. The approach offers practical impact by enabling faster, near-optimal routing with adjustable complexity suitable for human-in-the-loop warehousing.
Abstract
Order Picker Routing is a critical issue in Warehouse Operations Management. Due to the complexity of the problem and the need for quick solutions, suboptimal algorithms are frequently employed in practice. However, Reinforcement Learning offers an appealing alternative to traditional heuristics, potentially outperforming existing methods in terms of speed and accuracy. We introduce an attention based neural network for modeling picker tours, which is trained using Reinforcement Learning. Our method is evaluated against existing heuristics across a range of problem parameters to demonstrate its efficacy. A key advantage of our proposed method is its ability to offer an option to reduce the perceived complexity of routes.
