Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations

Sasan Mahmoudinazlou; Abhay Sobhanan; Hadi Charkhgard; Ali Eshragh; George Dunn

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations

Sasan Mahmoudinazlou, Abhay Sobhanan, Hadi Charkhgard, Ali Eshragh, George Dunn

TL;DR

This work addresses dynamic order picking in a single-block warehouse by formulating it as a real-time decision problem and solving it with a Deep Q-Network (DQN) DRL approach. The method jointly learns order batching and picker routing within a unified framework, using a feature-based state representation that feeds a two-stream FCN to produce action values. The study demonstrates substantial gains over static and heuristic baselines in terms of average order completion time and fulfilled orders, including strong robustness to out-of-sample demand and varying arrival rates. Practically, the approach supports real-time automated decision-making for warehouse operations and establishes a foundation for extending to more complex layouts or multiple autonomous agents.

Abstract

Order picking is a pivotal operation in warehouses that directly impacts overall efficiency and profitability. This study addresses the dynamic order picking problem, a significant concern in modern warehouse management, where real-time adaptation to fluctuating order arrivals and efficient picker routing are crucial. Traditional methods, which often depend on static optimization algorithms designed around fixed order sets for the picker routing, fall short in addressing the challenges of this dynamic environment. To overcome these challenges, we propose a Deep Reinforcement Learning (DRL) framework tailored for single-block warehouses equipped with an autonomous picking device. By dynamically optimizing picker routes, our approach significantly reduces order throughput times and unfulfilled orders, particularly under high order arrival rates. We benchmark our DRL model against established algorithms, utilizing instances generated based on standard practices in the order picking literature. Experimental results demonstrate the superiority of our DRL model over benchmark algorithms. For example, at a high order arrival rate of 0.09 (i.e., 9 orders per 100 units of time on average), our approach achieves an order fulfillment rate of approximately 98%, compared to the 82% fulfillment rate observed with benchmarking algorithms. We further investigate the integration of a hyperparameter in the reward function that allows for flexible balancing between distance traveled and order completion time. Finally, we demonstrate the robustness of our DRL model on out-of-sample test instances.

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations

TL;DR

Abstract

Paper Structure (26 sections, 1 equation, 6 figures, 12 tables, 1 algorithm)

This paper contains 26 sections, 1 equation, 6 figures, 12 tables, 1 algorithm.

Introduction
Literature Review
Optimization Approaches for Order Picking
Deep Reinforcement Learning for Warehouse Operations
Broader DRL Applications for Dynamic Assignment and Routing
Research Gap
Problem Description
Methodology
MDP Formulation
Proposed Approach
Architecture of Deep Neural Network
Comparison with Alternative Neural Network Architectures
Training of Deep Neural Network
Computational Study
Experimental Settings
...and 11 more sections

Figures (6)

Figure 1: An illustrative example of warehouse layout for the order-picking problem
Figure 2: A visual representation of the proposed solution method
Figure 3: Proposed Deep Neural Network Architecture
Figure 4: Training curves showing the convergence of the 20-period moving average reward for each trained model
Figure 5: Example demonstration of a pick cycle executed by the learned DRL agent with $\lambda=0.06$ and $\alpha = 1$
...and 1 more figures

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations

TL;DR

Abstract

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations

Authors

TL;DR

Abstract

Table of Contents

Figures (6)