LaDe: The First Comprehensive Last-mile Delivery Dataset from Industry
Lixia Wu, Haomin Wen, Haoyuan Hu, Xiaowei Mao, Yutong Xia, Ergang Shan, Jianbin Zheng, Junhong Lou, Yuxuan Liang, Liuqing Yang, Roger Zimmermann, Youfang Lin, Huaiyu Wan
TL;DR
LaDe addresses the critical need for publicly available, real-world last-mile delivery data by introducing the first industry-scale dataset with pick-up and delivery records across multiple Chinese cities. Comprising LaDe-P (pick-up) and LaDe-D (delivery), it offers large-scale, comprehensive, and diverse features suitable for route prediction, ETA estimation, and spatio-temporal graph forecasting, with baseline benchmarks and public code. The dataset enables multi-task research and cross-city analysis, while also highlighting challenges such as data missingness and privacy perturbations. By releasing LaDe on HuggingFace, the work aims to accelerate advances in logistics optimization, spatio-temporal data mining, and related AI applications in the last-mile domain.
Abstract
Real-world last-mile delivery datasets are crucial for research in logistics, supply chain management, and spatio-temporal data mining. Despite a plethora of algorithms developed to date, no widely accepted, publicly available last-mile delivery dataset exists to support research in this field. In this paper, we introduce \texttt{LaDe}, the first publicly available last-mile delivery dataset with millions of packages from the industry. LaDe has three unique characteristics: (1) Large-scale. It involves 10,677k packages of 21k couriers over 6 months of real-world operation. (2) Comprehensive information. It offers original package information, such as its location and time requirements, as well as task-event information, which records when and where the courier is while events such as task-accept and task-finish events happen. (3) Diversity. The dataset includes data from various scenarios, including package pick-up and delivery, and from multiple cities, each with its unique spatio-temporal patterns due to their distinct characteristics such as populations. We verify LaDe on three tasks by running several classical baseline models per task. We believe that the large-scale, comprehensive, diverse feature of LaDe can offer unparalleled opportunities to researchers in the supply chain community, data mining community, and beyond. The dataset homepage is publicly available at https://huggingface.co/datasets/Cainiao-AI/LaDe.
