Table of Contents
Fetching ...

Timing the Match: A Deep Reinforcement Learning Approach for Ride-Hailing and Ride-Pooling Services

Yiman Bao, Jie Gao, Jinke He, Frans A. Oliehoek, Oded Cats

TL;DR

The paper tackles adaptive timing for batched ride-matching in dynamic ride-hailing and ride-pooling environments. It frames the problem as a finite-horizon reinforcement learning task using Proximal Policy Optimization (PPO) with potential-based reward shaping (PBRS) to address sparse rewards, and validates the approach with a realistic NYC-based simulator that uses real-world data. Key contributions include formulating the timing problem as a Markov decision process with binary actions, designing state and reward structures that reflect waiting and detour costs, and demonstrating superior performance over fixed-interval batching and first-dispatch baselines, including faster learning and lower total waiting times. The study’s findings underscore the practical potential of adaptive matching timing to improve efficiency and user satisfaction in urban mobility platforms, with implications for scalable deployment and sustainability of ride-hailing and ride-pooling services.

Abstract

Efficient timing in ride-matching is crucial for improving the performance of ride-hailing and ride-pooling services, as it determines the number of drivers and passengers considered in each matching process. Traditional batched matching methods often use fixed time intervals to accumulate ride requests before assigning matches. While this approach increases the number of available drivers and passengers for matching, it fails to adapt to real-time supply-demand fluctuations, often leading to longer passenger wait times and driver idle periods. To address this limitation, we propose an adaptive ride-matching strategy using deep reinforcement learning (RL) to dynamically determine when to perform matches based on real-time system conditions. Unlike fixed-interval approaches, our method continuously evaluates system states and executes matching at moments that minimize total passenger wait time. Additionally, we incorporate a potential-based reward shaping (PBRS) mechanism to mitigate sparse rewards, accelerating RL training and improving decision quality. Extensive empirical evaluations using a realistic simulator trained on real-world data demonstrate that our approach outperforms fixed-interval matching strategies, significantly reducing passenger waiting times and detour delays, thereby enhancing the overall efficiency of ride-hailing and ride-pooling systems.

Timing the Match: A Deep Reinforcement Learning Approach for Ride-Hailing and Ride-Pooling Services

TL;DR

The paper tackles adaptive timing for batched ride-matching in dynamic ride-hailing and ride-pooling environments. It frames the problem as a finite-horizon reinforcement learning task using Proximal Policy Optimization (PPO) with potential-based reward shaping (PBRS) to address sparse rewards, and validates the approach with a realistic NYC-based simulator that uses real-world data. Key contributions include formulating the timing problem as a Markov decision process with binary actions, designing state and reward structures that reflect waiting and detour costs, and demonstrating superior performance over fixed-interval batching and first-dispatch baselines, including faster learning and lower total waiting times. The study’s findings underscore the practical potential of adaptive matching timing to improve efficiency and user satisfaction in urban mobility platforms, with implications for scalable deployment and sustainability of ride-hailing and ride-pooling services.

Abstract

Efficient timing in ride-matching is crucial for improving the performance of ride-hailing and ride-pooling services, as it determines the number of drivers and passengers considered in each matching process. Traditional batched matching methods often use fixed time intervals to accumulate ride requests before assigning matches. While this approach increases the number of available drivers and passengers for matching, it fails to adapt to real-time supply-demand fluctuations, often leading to longer passenger wait times and driver idle periods. To address this limitation, we propose an adaptive ride-matching strategy using deep reinforcement learning (RL) to dynamically determine when to perform matches based on real-time system conditions. Unlike fixed-interval approaches, our method continuously evaluates system states and executes matching at moments that minimize total passenger wait time. Additionally, we incorporate a potential-based reward shaping (PBRS) mechanism to mitigate sparse rewards, accelerating RL training and improving decision quality. Extensive empirical evaluations using a realistic simulator trained on real-world data demonstrate that our approach outperforms fixed-interval matching strategies, significantly reducing passenger waiting times and detour delays, thereby enhancing the overall efficiency of ride-hailing and ride-pooling systems.

Paper Structure

This paper contains 28 sections, 21 equations, 13 figures, 3 tables, 1 algorithm.

Figures (13)

  • Figure 1: Comparison of Rewards Before and After PBRS
  • Figure 2: Comparison of Rewards without and with PBRS
  • Figure 3: Comparison of Passenger Request Data and Idle Driver Data Generated by the Simulator and Real Data
  • Figure 4: Possible Pickup and Drop-off Sequences for Two Passenger Orders
  • Figure 5: Daily Supply-Demand Fluctuations and Geographical Distribution in Ride-Hailing and Ride-Pooling Systems
  • ...and 8 more figures