Table of Contents
Fetching ...

LaDe: The First Comprehensive Last-mile Delivery Dataset from Industry

Lixia Wu, Haomin Wen, Haoyuan Hu, Xiaowei Mao, Yutong Xia, Ergang Shan, Jianbin Zheng, Junhong Lou, Yuxuan Liang, Liuqing Yang, Roger Zimmermann, Youfang Lin, Huaiyu Wan

TL;DR

LaDe addresses the critical need for publicly available, real-world last-mile delivery data by introducing the first industry-scale dataset with pick-up and delivery records across multiple Chinese cities. Comprising LaDe-P (pick-up) and LaDe-D (delivery), it offers large-scale, comprehensive, and diverse features suitable for route prediction, ETA estimation, and spatio-temporal graph forecasting, with baseline benchmarks and public code. The dataset enables multi-task research and cross-city analysis, while also highlighting challenges such as data missingness and privacy perturbations. By releasing LaDe on HuggingFace, the work aims to accelerate advances in logistics optimization, spatio-temporal data mining, and related AI applications in the last-mile domain.

Abstract

Real-world last-mile delivery datasets are crucial for research in logistics, supply chain management, and spatio-temporal data mining. Despite a plethora of algorithms developed to date, no widely accepted, publicly available last-mile delivery dataset exists to support research in this field. In this paper, we introduce \texttt{LaDe}, the first publicly available last-mile delivery dataset with millions of packages from the industry. LaDe has three unique characteristics: (1) Large-scale. It involves 10,677k packages of 21k couriers over 6 months of real-world operation. (2) Comprehensive information. It offers original package information, such as its location and time requirements, as well as task-event information, which records when and where the courier is while events such as task-accept and task-finish events happen. (3) Diversity. The dataset includes data from various scenarios, including package pick-up and delivery, and from multiple cities, each with its unique spatio-temporal patterns due to their distinct characteristics such as populations. We verify LaDe on three tasks by running several classical baseline models per task. We believe that the large-scale, comprehensive, diverse feature of LaDe can offer unparalleled opportunities to researchers in the supply chain community, data mining community, and beyond. The dataset homepage is publicly available at https://huggingface.co/datasets/Cainiao-AI/LaDe.

LaDe: The First Comprehensive Last-mile Delivery Dataset from Industry

TL;DR

LaDe addresses the critical need for publicly available, real-world last-mile delivery data by introducing the first industry-scale dataset with pick-up and delivery records across multiple Chinese cities. Comprising LaDe-P (pick-up) and LaDe-D (delivery), it offers large-scale, comprehensive, and diverse features suitable for route prediction, ETA estimation, and spatio-temporal graph forecasting, with baseline benchmarks and public code. The dataset enables multi-task research and cross-city analysis, while also highlighting challenges such as data missingness and privacy perturbations. By releasing LaDe on HuggingFace, the work aims to accelerate advances in logistics optimization, spatio-temporal data mining, and related AI applications in the last-mile domain.

Abstract

Real-world last-mile delivery datasets are crucial for research in logistics, supply chain management, and spatio-temporal data mining. Despite a plethora of algorithms developed to date, no widely accepted, publicly available last-mile delivery dataset exists to support research in this field. In this paper, we introduce \texttt{LaDe}, the first publicly available last-mile delivery dataset with millions of packages from the industry. LaDe has three unique characteristics: (1) Large-scale. It involves 10,677k packages of 21k couriers over 6 months of real-world operation. (2) Comprehensive information. It offers original package information, such as its location and time requirements, as well as task-event information, which records when and where the courier is while events such as task-accept and task-finish events happen. (3) Diversity. The dataset includes data from various scenarios, including package pick-up and delivery, and from multiple cities, each with its unique spatio-temporal patterns due to their distinct characteristics such as populations. We verify LaDe on three tasks by running several classical baseline models per task. We believe that the large-scale, comprehensive, diverse feature of LaDe can offer unparalleled opportunities to researchers in the supply chain community, data mining community, and beyond. The dataset homepage is publicly available at https://huggingface.co/datasets/Cainiao-AI/LaDe.
Paper Structure (28 sections, 5 figures, 13 tables)

This paper contains 28 sections, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Overview of LaDe from last-mile delivery (better viewed in color), which includes two sub-datasets: LaDe-P from package pick-up process (i.e., couriers pick up packages from sender customers and return the depot) and LaDe-D from delivery process (i.e., couriers deliver packages from the depot to receiver customers).
  • Figure 2: Region-level and AOI-level data.
  • Figure 3: Spatial and temporal distribution of data in Shanghai of LaDe-P.
  • Figure 4: Diversity of cities. We select two cities, Hangzhou and Jilin, as an example to reveal their different spatio-temporal distributions. (a) The time distribution of packages in a day; (b) The ETA distribution of packages; (c) The distribution of the average distance between two consecutive packages in a courier's route. A significant difference is observed in the above illustration.
  • Figure 5: Illustration of three real-world applications. (a): Route prediction predicts the future pick-up route of a courier. (b): ETA prediction estimates the courier's arrival time for picking up or delivering packages. (c): STG forecasting predicts the future package number in given regions/AOIs.