Table of Contents
Fetching ...

Ready, Bid, Go! On-Demand Delivery Using Fleets of Drones with Unknown, Heterogeneous Energy Storage Constraints

Mohamed S. Talamali, Genki Miyauchi, Thomas Watteyne, Micael S. Couceiro, Roderich Gross

TL;DR

This work tackles on-demand UAV delivery with fleets that have unknown energy capacities and unknown consumption models. It introduces a decentralised deployment strategy that combines auction-based task allocation with online learning, enabling each UAV to bid based on current SoC, parcel mass, and distance, while learning to improve its policy over time. Key findings show that assigning tasks to the least confident bidders can reduce delivery times and increase completed deliveries, and forecasting-based extensions help prioritize early orders. The approach demonstrates scalable, energy-aware decision-making for real-world UAV swarms in dynamic environments and outperforms threshold-based methods across a range of conditions.

Abstract

Unmanned Aerial Vehicles (UAVs) are expected to transform logistics, reducing delivery time, costs, and emissions. This study addresses an on-demand delivery , in which fleets of UAVs are deployed to fulfil orders that arrive stochastically. Unlike previous work, it considers UAVs with heterogeneous, unknown energy storage capacities and assumes no knowledge of the energy consumption models. We propose a decentralised deployment strategy that combines auction-based task allocation with online learning. Each UAV independently decides whether to bid for orders based on its energy storage charge level, the parcel mass, and delivery distance. Over time, it refines its policy to bid only for orders within its capability. Simulations using realistic UAV energy models reveal that, counter-intuitively, assigning orders to the least confident bidders reduces delivery times and increases the number of successfully fulfilled orders. This strategy is shown to outperform threshold-based methods which require UAVs to exceed specific charge levels at deployment. We propose a variant of the strategy which uses learned policies for forecasting. This enables UAVs with insufficient charge levels to commit to fulfilling orders at specific future times, helping to prioritise early orders. Our work provides new insights into long-term deployment of UAV swarms, highlighting the advantages of decentralised energy-aware decision-making coupled with online learning in real-world dynamic environments.

Ready, Bid, Go! On-Demand Delivery Using Fleets of Drones with Unknown, Heterogeneous Energy Storage Constraints

TL;DR

This work tackles on-demand UAV delivery with fleets that have unknown energy capacities and unknown consumption models. It introduces a decentralised deployment strategy that combines auction-based task allocation with online learning, enabling each UAV to bid based on current SoC, parcel mass, and distance, while learning to improve its policy over time. Key findings show that assigning tasks to the least confident bidders can reduce delivery times and increase completed deliveries, and forecasting-based extensions help prioritize early orders. The approach demonstrates scalable, energy-aware decision-making for real-world UAV swarms in dynamic environments and outperforms threshold-based methods across a range of conditions.

Abstract

Unmanned Aerial Vehicles (UAVs) are expected to transform logistics, reducing delivery time, costs, and emissions. This study addresses an on-demand delivery , in which fleets of UAVs are deployed to fulfil orders that arrive stochastically. Unlike previous work, it considers UAVs with heterogeneous, unknown energy storage capacities and assumes no knowledge of the energy consumption models. We propose a decentralised deployment strategy that combines auction-based task allocation with online learning. Each UAV independently decides whether to bid for orders based on its energy storage charge level, the parcel mass, and delivery distance. Over time, it refines its policy to bid only for orders within its capability. Simulations using realistic UAV energy models reveal that, counter-intuitively, assigning orders to the least confident bidders reduces delivery times and increases the number of successfully fulfilled orders. This strategy is shown to outperform threshold-based methods which require UAVs to exceed specific charge levels at deployment. We propose a variant of the strategy which uses learned policies for forecasting. This enables UAVs with insufficient charge levels to commit to fulfilling orders at specific future times, helping to prioritise early orders. Our work provides new insights into long-term deployment of UAV swarms, highlighting the advantages of decentralised energy-aware decision-making coupled with online learning in real-world dynamic environments.

Paper Structure

This paper contains 22 sections, 13 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: On-demand delivery scenario with a fleet of UAVs committing to deliver orders arriving at a fulfilment centre. The fleet is heterogeneous, as the UAVs differ in battery health, and hence in their true energy storage capacities. Each UAV learns a policy for placing bids on incoming orders based on its current charge level, parcel mass and delivery distance. Optionally, it can use this policy for forecasting, enabling it to plan the fulfilment of orders in the future.
  • Figure 2: Overview of decentralised learning-based deployment strategy and evaluation environment: (a) A UAV's logic is governed by a finite-state machine; (b) the UAV uses a bidding policy to determine whether to bid (and an associated level of confidence) and a bids evaluation policy to determine whether its bid won; (c) upon returning from a delivery attempt, the UAV updates its bidding policy; (d) Screenshot of the purpose-built simulator showing the fulfilment centre and six drones, five of which are attempting a delivery, whereas the sixth is returning following delivery of a parcel.
  • Figure 3: Performance of the learning-based deployment strategy for different winner selection rules: Least confident (green), Most confident (pink), and Random (orange). Metrics used are (a) number of delivered parcels and delivery time, (b) percentage of aborted delivery attempts, and (c) cumulative backlog age (segmented by order arrival weeks).
  • Figure 4: (a) Decision accuracy over time for the three winner selection strategies: Least confident (green), Most confident (pink), and Random (orange). (b) SoH of UAVs deployed by the Least confident winner selection rule for various tasks and SoC values.
  • Figure 5: Comparing the learning-based deployment strategy against the threshold-based deployment strategy (a--c); (a) The learning-based strategy outperforms the threshold-based strategy in terms of the number of delivered parcels and delivery time, (b) results in a higher number of failed delivery attempts, (c) but produces less backlog. (d) Cumulative backlog age after eight weeks for the learning-based strategy with and without reservations.