Table of Contents
Fetching ...

To Start Up a Start-Up$-$Embedding Strategic Demand Development in Operational On-Demand Fulfillment via Reinforcement Learning with Information Shaping

Xinwei Chen, Marlin W. Ulmer, Barrett W. Thomas

TL;DR

This work addresses the challenge of establishing on-demand delivery startups with limited initial fleets by integrating long-term, demand-growth effects into daily operations. It first derives analytical insights from a stylized two-region inter-day problem, showing that optimal resource allocation can be either equal across regions or concentrated in one region depending on growth potential, and then uses these insights to train a reinforcement-learning policy for intra-day fulfillment. A novel concept, information shaping, is introduced to guide RL by controlling the distribution of training-demand scenarios, enabling the policy to align short-term actions with long-term demand objectives. Empirical results demonstrate that combining intra-day operational efficiency with inter-day anticipation significantly improves both current service levels and future demand, offering practical guidance on when to invest resources regionally versus city-wide. The approach generalizes beyond logistics to other domains where decisions under uncertainty must balance immediate throughput with long-run demand evolution.

Abstract

The last few years have witnessed rapid growth in the on-demand delivery market, with many start-ups entering the field. However, not all of these start-ups have succeeded due to various reasons, among others, not being able to establish a large enough customer base. In this paper, we address this problem that many on-demand transportation start-ups face: how to establish themselves in a new market. When starting, such companies often have limited fleet resources to serve demand across a city. Depending on the use of the fleet, varying service quality is observed in different areas of the city, and in turn, the service quality impacts the respective growth of demand in each area. Thus, operational fulfillment decisions drive the longer-term demand development. To integrate strategic demand development into real-time fulfillment operations, we propose a two-step approach. First, we derive analytical insights into optimal allocation decisions for a stylized problem. Second, we use these insights to shape the training data of a reinforcement learning strategy for operational real-time fulfillment. Our experiments demonstrate that combining operational efficiency with long-term strategic planning is highly advantageous. Further, we show that the careful shaping of training data is essential for the successful development of demand.

To Start Up a Start-Up$-$Embedding Strategic Demand Development in Operational On-Demand Fulfillment via Reinforcement Learning with Information Shaping

TL;DR

This work addresses the challenge of establishing on-demand delivery startups with limited initial fleets by integrating long-term, demand-growth effects into daily operations. It first derives analytical insights from a stylized two-region inter-day problem, showing that optimal resource allocation can be either equal across regions or concentrated in one region depending on growth potential, and then uses these insights to train a reinforcement-learning policy for intra-day fulfillment. A novel concept, information shaping, is introduced to guide RL by controlling the distribution of training-demand scenarios, enabling the policy to align short-term actions with long-term demand objectives. Empirical results demonstrate that combining intra-day operational efficiency with inter-day anticipation significantly improves both current service levels and future demand, offering practical guidance on when to invest resources regionally versus city-wide. The approach generalizes beyond logistics to other domains where decisions under uncertainty must balance immediate throughput with long-run demand evolution.

Abstract

The last few years have witnessed rapid growth in the on-demand delivery market, with many start-ups entering the field. However, not all of these start-ups have succeeded due to various reasons, among others, not being able to establish a large enough customer base. In this paper, we address this problem that many on-demand transportation start-ups face: how to establish themselves in a new market. When starting, such companies often have limited fleet resources to serve demand across a city. Depending on the use of the fleet, varying service quality is observed in different areas of the city, and in turn, the service quality impacts the respective growth of demand in each area. Thus, operational fulfillment decisions drive the longer-term demand development. To integrate strategic demand development into real-time fulfillment operations, we propose a two-step approach. First, we derive analytical insights into optimal allocation decisions for a stylized problem. Second, we use these insights to shape the training data of a reinforcement learning strategy for operational real-time fulfillment. Our experiments demonstrate that combining operational efficiency with long-term strategic planning is highly advantageous. Further, we show that the careful shaping of training data is essential for the successful development of demand.

Paper Structure

This paper contains 54 sections, 4 theorems, 40 equations, 9 figures, 2 tables.

Key Result

Proposition 1

In the non-trivial case, there exists an optimal solution $r_1^*$ and $r_2^*$ where the resource constraint is binding, i.e., $\beta \sqrt{r_1 \lambda_1 A} + \beta \sqrt{r_2 \lambda_2 A} = T$.

Figures (9)

  • Figure 1: Example for a state of the intra-day problem.
  • Figure 2: Example for demand developments in the inter-day problem.
  • Figure 3: Objective value vs. resources invested in Region 1.
  • Figure 4: Demand distribution vs. different strategies.
  • Figure 6:
  • ...and 4 more figures

Theorems & Definitions (8)

  • Proposition 1
  • proof : Proof of Proposition \ref{['binding_assumption']}.
  • Theorem 1
  • Lemma 1
  • proof
  • Corollary 1
  • proof
  • proof : Proof of Corollary \ref{['optimal_ratio_theorem_main_body']}.