Table of Contents
Fetching ...

Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems

Heiko Hoppe, Tobias Enders, Quentin Cappart, Maximilian Schiffer

TL;DR

This work tackles profit-maximizing dispatch in autonomous mobility on demand by introducing a global-rewards MADRL framework that leverages a counterfactual baseline to align agent incentives with the operator's system-wide profit $Profit^*$. The authors develop and compare several COMA-SAC adaptations, culminating in COMA^scd, a reward-scheduling approach that blends local and global signals for scalable learning. Empirical results on real-world taxi data show competitive gains (up to 2% on average and up to 6% on certain dates) over state-of-the-art local-reward methods, along with a structural analysis indicating improved implicit vehicle balancing and demand forecasting. The work provides practical, scalable methods and open-source code for global-reward MADRL in AMoD and suggests directions for extending global-credit approaches to larger-scale or decentralized settings.

Abstract

We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.

Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems

TL;DR

This work tackles profit-maximizing dispatch in autonomous mobility on demand by introducing a global-rewards MADRL framework that leverages a counterfactual baseline to align agent incentives with the operator's system-wide profit . The authors develop and compare several COMA-SAC adaptations, culminating in COMA^scd, a reward-scheduling approach that blends local and global signals for scalable learning. Empirical results on real-world taxi data show competitive gains (up to 2% on average and up to 6% on certain dates) over state-of-the-art local-reward methods, along with a structural analysis indicating improved implicit vehicle balancing and demand forecasting. The work provides practical, scalable methods and open-source code for global-reward MADRL in AMoD and suggests directions for extending global-credit approaches to larger-scale or decentralized settings.

Abstract

We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
Paper Structure (32 sections, 1 theorem, 14 equations, 6 figures, 5 tables)

This paper contains 32 sections, 1 theorem, 14 equations, 6 figures, 5 tables.

Key Result

proposition 1

The loss function $J_\pi(\phi|s,i)$ as defined in Equation eqn:loss is equivalent to the entropy $J_\pi(\phi|s,i)=\sum_{a_i}\pi(a_i)\: \alpha\log\pi(a_i)$ of a plain SAC architecture.

Figures (6)

  • Figure 1: Exemplary vehicle dispatching process.
  • Figure 2: Outline of base algorithm. Black parts are used during training and testing, gray parts only during training.
  • Figure 3: Relative test performance $\Delta\ [\%]$ of $\text{COMA}^\text{scd}$ vs. greedy and LRA for multiple test dates.
  • Figure 4: Relative test performance $\Delta\ [\%]$ of all algorithms vs. $\text{COMA}^\text{scd}$.
  • Figure 5: Hexagonal grid of Manhattan with considered areas for instances of this paper marked in green.
  • ...and 1 more figures

Theorems & Definitions (1)

  • proposition 1