Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform

Jingyi Cheng; Shadi Sharif Azadeh

Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform

Jingyi Cheng, Shadi Sharif Azadeh

TL;DR

This paper tackles real-time dispatching and idle fleet steering for on-demand meal delivery by proposing a reinforcement learning–based strategic dual-control framework. It combines short-term demand forecasting using LD-XGBoost with a Conv-DDQN policy for dispatching and a DDQN-based policy for idle courier steering, both trained in a sandwich, iterative manner within a digital-twin environment. The approach jointly optimizes current deliveries and future network balance, achieving improved delivery efficiency, reduced under-supply, and fairer workload distribution across couriers, while enabling real-time execution. The work demonstrates the value of forward-looking, predict-then-optimize RL in complex, dynamic service networks and outlines practical directions for scaling and extending to broader on-demand contexts.

Abstract

To achieve high service quality and profitability, meal delivery platforms like Uber Eats and Grubhub must strategically operate their fleets to ensure timely deliveries for current orders while mitigating the consequential impacts of suboptimal decisions that leads to courier understaffing in the future. This study set out to solve the real-time order dispatching and idle courier steering problems for a meal delivery platform by proposing a reinforcement learning (RL)-based strategic dual-control framework. To address the inherent sequential nature of these problems, we model both order dispatching and courier steering as Markov Decision Processes. Trained via a deep reinforcement learning (DRL) framework, we obtain strategic policies by leveraging the explicitly predicted demands as part of the inputs. In our dual-control framework, the dispatching and steering policies are iteratively trained in an integrated manner. These forward-looking policies can be executed in real-time and provide decisions while jointly considering the impacts on local and network levels. To enhance dispatching fairness, we propose convolutional deep Q networks to construct fair courier embeddings. To simultaneously rebalance the supply and demand within the service network, we propose to utilize mean-field approximated supply-demand knowledge to reallocate idle couriers at the local level. Utilizing the policies generated by the RL-based strategic dual-control framework, we find the delivery efficiency and fairness of workload distribution among couriers have been improved, and under-supplied conditions have been alleviated within the service network. Our study sheds light on designing an RL-based framework to enable forward-looking real-time operations for meal delivery platforms and other on-demand services.

Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform

TL;DR

Abstract

Paper Structure (38 sections, 15 equations, 12 figures, 9 tables, 2 algorithms)

This paper contains 38 sections, 15 equations, 12 figures, 9 tables, 2 algorithms.

Introduction
Literature Review
Predict-then-Optimize
Order Dispatching
Supply Steering
Research Gap and Our Contributions
Problem Description
On-Demand Meal Delivery System
Grids, Distances and Travelling Speed
Couriers
Orders and Order Sampler
RL-based Strategic Dual-Control Framework
Short-Term Demand Prediction with XGBoost
Strategic Order Dispatching
State
...and 23 more sections

Figures (12)

Figure 1: Example processes of order dispatching and idle courier steering.
Figure 2: Visualized examples for grid-wise distance calculation.
Figure 3: Convolutional Deep Q Networks
Figure 4: Snapshots of the meal delivery service network at 20:37 from an arbitrary simulation. The figure on the left shows the distribution of idle couriers, while the figure on the right shows the distribution of the current supply-demand gap within the network. The demands arose from the center area, but the idle couriers were located outside the center.
Figure 5: Efficient idle courier steering should be directed towards under-supplied areas. In this figure, the red numbers denote the anticipated or current supply-demand gap on the corresponding grids. And the black numbers denote the ID of the grids within the immediate neighborhood of grid 0. The direction towards grid 3 points to the under-supplied area according to the neighborhood supply-demand information of grid 3.
...and 7 more figures

Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform

TL;DR

Abstract

Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform

Authors

TL;DR

Abstract

Table of Contents

Figures (12)