Table of Contents
Fetching ...

Dynamic Demand Management for Parcel Lockers

Daniela Sailer, Robert Klein, Claudius Steinhardt

TL;DR

The paper addresses dynamic, stochastic last-mile parcel locker capacity with multiple compartment sizes and upgrading, formulating the Dynamic Parcel Locker Demand Management Problem (DPLDMP) as an infinite-horizon sequential decision process. It proposes a novel anticipatory framework combining cost function approximation (CFA) for allocation with offline-trained parametric value function approximation (VFA) for demand control, augmented by a truncated online rollout and a structure-enforcing experience replay mechanism. Computational results show the approach outperforms myopic and industry-inspired benchmarks, with gains up to 26% in objective value, and reveal the importance of aligning demand-control and allocation decisions, as well as how heterogeneity in customer types amplifies benefits. Managerially, the work highlights capacity reservation for high-priority customers, preserving flexibility via strategic use of smaller parcels and shorter lead times. The methodology provides a foundation for integrating domain knowledge into value-function learning and offers directions for extensions such as overbooking and integrated routing.

Abstract

In pursuit of a more sustainable and cost-efficient last mile, parcel lockers have gained a firm foothold in the parcel delivery landscape. To fully exploit their potential and simultaneously ensure customer satisfaction, successful management of the locker's limited capacity is crucial. This is challenging as future delivery requests and pickup times are stochastic from the provider's perspective. In response, we propose to dynamically control whether the locker is presented as an available delivery option to each incoming customer with the goal of maximizing the number of served requests weighted by their priority. Additionally, we take different compartment sizes into account, which entails a second type of decision as parcels scheduled for delivery must be allocated. We formalize the problem as an infinite-horizon sequential decision problem and find that exact methods are intractable due to the curses of dimensionality. In light of this, we develop a solution framework that orchestrates multiple algorithmic techniques rooted in Sequential Decision Analytics and Reinforcement Learning, namely cost function approximation and an offline trained parametric value function approximation together with a truncated online rollout. Our innovative approach to combine these techniques enables us to address the strong interrelations between the two decision types. As a general methodological contribution, we enhance the training of our value function approximation with a modified version of experience replay that enforces structure in the value function. Our computational study shows that our method outperforms a myopic benchmark by 13.7% and an industry-inspired policy by 12.6%.

Dynamic Demand Management for Parcel Lockers

TL;DR

The paper addresses dynamic, stochastic last-mile parcel locker capacity with multiple compartment sizes and upgrading, formulating the Dynamic Parcel Locker Demand Management Problem (DPLDMP) as an infinite-horizon sequential decision process. It proposes a novel anticipatory framework combining cost function approximation (CFA) for allocation with offline-trained parametric value function approximation (VFA) for demand control, augmented by a truncated online rollout and a structure-enforcing experience replay mechanism. Computational results show the approach outperforms myopic and industry-inspired benchmarks, with gains up to 26% in objective value, and reveal the importance of aligning demand-control and allocation decisions, as well as how heterogeneity in customer types amplifies benefits. Managerially, the work highlights capacity reservation for high-priority customers, preserving flexibility via strategic use of smaller parcels and shorter lead times. The methodology provides a foundation for integrating domain knowledge into value-function learning and offers directions for extensions such as overbooking and integrated routing.

Abstract

In pursuit of a more sustainable and cost-efficient last mile, parcel lockers have gained a firm foothold in the parcel delivery landscape. To fully exploit their potential and simultaneously ensure customer satisfaction, successful management of the locker's limited capacity is crucial. This is challenging as future delivery requests and pickup times are stochastic from the provider's perspective. In response, we propose to dynamically control whether the locker is presented as an available delivery option to each incoming customer with the goal of maximizing the number of served requests weighted by their priority. Additionally, we take different compartment sizes into account, which entails a second type of decision as parcels scheduled for delivery must be allocated. We formalize the problem as an infinite-horizon sequential decision problem and find that exact methods are intractable due to the curses of dimensionality. In light of this, we develop a solution framework that orchestrates multiple algorithmic techniques rooted in Sequential Decision Analytics and Reinforcement Learning, namely cost function approximation and an offline trained parametric value function approximation together with a truncated online rollout. Our innovative approach to combine these techniques enables us to address the strong interrelations between the two decision types. As a general methodological contribution, we enhance the training of our value function approximation with a modified version of experience replay that enforces structure in the value function. Our computational study shows that our method outperforms a myopic benchmark by 13.7% and an industry-inspired policy by 12.6%.
Paper Structure (63 sections, 22 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 63 sections, 22 equations, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: Timeline for decisions and events.
  • Figure 2: Illustrative example.
  • Figure 3: Solution procedure.
  • Figure 4: Objective improvement over myopic benchmark.
  • Figure 5: Objective improvement per setting.
  • ...and 2 more figures