Table of Contents
Fetching ...

Deep Reinforcement Learning for Picker Routing Problem in Warehousing

George Dunn, Hadi Charkhgard, Ali Eshragh, Sasan Mahmoudinazlou, Elizabeth Stojanovski

TL;DR

Addresses efficient picker routing in rectangular warehouses by formulating a Tour-Graph Markov Decision Process and solving it with an attention-based neural network trained via reinforcement learning. The model sequentially adds vertical and horizontal tour edges, using a Transformer encoder and masked outputs to produce feasible tours and optionally simplify routes for human operators. Key contributions include the MDP formulation, the encoder-based architecture for aisle-level inputs, and a training regimen with RL to outperform common heuristics across diverse warehouse sizes. The approach offers practical impact by enabling faster, near-optimal routing with adjustable complexity suitable for human-in-the-loop warehousing.

Abstract

Order Picker Routing is a critical issue in Warehouse Operations Management. Due to the complexity of the problem and the need for quick solutions, suboptimal algorithms are frequently employed in practice. However, Reinforcement Learning offers an appealing alternative to traditional heuristics, potentially outperforming existing methods in terms of speed and accuracy. We introduce an attention based neural network for modeling picker tours, which is trained using Reinforcement Learning. Our method is evaluated against existing heuristics across a range of problem parameters to demonstrate its efficacy. A key advantage of our proposed method is its ability to offer an option to reduce the perceived complexity of routes.

Deep Reinforcement Learning for Picker Routing Problem in Warehousing

TL;DR

Addresses efficient picker routing in rectangular warehouses by formulating a Tour-Graph Markov Decision Process and solving it with an attention-based neural network trained via reinforcement learning. The model sequentially adds vertical and horizontal tour edges, using a Transformer encoder and masked outputs to produce feasible tours and optionally simplify routes for human operators. Key contributions include the MDP formulation, the encoder-based architecture for aisle-level inputs, and a training regimen with RL to outperform common heuristics across diverse warehouse sizes. The approach offers practical impact by enabling faster, near-optimal routing with adjustable complexity suitable for human-in-the-loop warehousing.

Abstract

Order Picker Routing is a critical issue in Warehouse Operations Management. Due to the complexity of the problem and the need for quick solutions, suboptimal algorithms are frequently employed in practice. However, Reinforcement Learning offers an appealing alternative to traditional heuristics, potentially outperforming existing methods in terms of speed and accuracy. We introduce an attention based neural network for modeling picker tours, which is trained using Reinforcement Learning. Our method is evaluated against existing heuristics across a range of problem parameters to demonstrate its efficacy. A key advantage of our proposed method is its ability to offer an option to reduce the perceived complexity of routes.
Paper Structure (23 sections, 22 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 23 sections, 22 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Stardard Rectangular Warehouse
  • Figure 2: Routing Heuristics Tour Graphs
  • Figure 3: Possible edges in optimum tour graph where (i)-(iv) are the vertical edges within aisle $i$ and (v)-(viii) are the horizontal edges between aisle $i$ and aisle $i+1$.
  • Figure 4: Proposed Model for Picker Routing Problem
  • Figure 5: Average optimality gap of proposed model compared to common heuristics for problem classes grouped by number of aisles (left) and size of pick list (right).
  • ...and 1 more figures