Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform
Jingyi Cheng, Shadi Sharif Azadeh
TL;DR
This paper tackles real-time dispatching and idle fleet steering for on-demand meal delivery by proposing a reinforcement learning–based strategic dual-control framework. It combines short-term demand forecasting using LD-XGBoost with a Conv-DDQN policy for dispatching and a DDQN-based policy for idle courier steering, both trained in a sandwich, iterative manner within a digital-twin environment. The approach jointly optimizes current deliveries and future network balance, achieving improved delivery efficiency, reduced under-supply, and fairer workload distribution across couriers, while enabling real-time execution. The work demonstrates the value of forward-looking, predict-then-optimize RL in complex, dynamic service networks and outlines practical directions for scaling and extending to broader on-demand contexts.
Abstract
To achieve high service quality and profitability, meal delivery platforms like Uber Eats and Grubhub must strategically operate their fleets to ensure timely deliveries for current orders while mitigating the consequential impacts of suboptimal decisions that leads to courier understaffing in the future. This study set out to solve the real-time order dispatching and idle courier steering problems for a meal delivery platform by proposing a reinforcement learning (RL)-based strategic dual-control framework. To address the inherent sequential nature of these problems, we model both order dispatching and courier steering as Markov Decision Processes. Trained via a deep reinforcement learning (DRL) framework, we obtain strategic policies by leveraging the explicitly predicted demands as part of the inputs. In our dual-control framework, the dispatching and steering policies are iteratively trained in an integrated manner. These forward-looking policies can be executed in real-time and provide decisions while jointly considering the impacts on local and network levels. To enhance dispatching fairness, we propose convolutional deep Q networks to construct fair courier embeddings. To simultaneously rebalance the supply and demand within the service network, we propose to utilize mean-field approximated supply-demand knowledge to reallocate idle couriers at the local level. Utilizing the policies generated by the RL-based strategic dual-control framework, we find the delivery efficiency and fairness of workload distribution among couriers have been improved, and under-supplied conditions have been alleviated within the service network. Our study sheds light on designing an RL-based framework to enable forward-looking real-time operations for meal delivery platforms and other on-demand services.
