Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

Xianjie Zhang; Jiahao Sun; Chen Gong; Kai Wang; Yifei Cao; Hao Chen; Hao Chen; Yu Liu

Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

Xianjie Zhang, Jiahao Sun, Chen Gong, Kai Wang, Yifei Cao, Hao Chen, Hao Chen, Yu Liu

TL;DR

The paper addresses dispatch and matching in city-scale on-demand ride pooling by partitioning the urban area into hexagonal regions and using a mean-field Q-learning (MFQL) framework with a mutual-information intrinsic reward to align vehicle and request distributions. It formalizes RDMP, introducing a trip-based matching ILP and a region-constrained matching stage, while MFQL handles dispatch decisions and MI guides exploration toward demand hotspots. The mutual information term $I(V_D;E_D)$ is estimated via a variational bound using an encoder, and the total reward becomes $r_v + \ddot{\alpha} I(V_D;E_D)$ to promote better distributional coupling. Experiments on a real taxi dataset (Manhattan) show consistent revenue improvements, with average gains around 3% over the best-known baselines, validating the approach's potential for practical deployment.

Abstract

The emergence of on-demand ride pooling services allows each vehicle to serve multiple passengers at a time, thus increasing drivers' income and enabling passengers to travel at lower prices than taxi/car on-demand services (only one passenger can be assigned to a car at a time like UberX and Lyft). Although on-demand ride pooling services can bring so many benefits, ride pooling services need a well-defined matching strategy to maximize the benefits for all parties (passengers, drivers, aggregation companies and environment), in which the regional dispatching of vehicles has a significant impact on the matching and revenue. Existing algorithms often only consider revenue maximization, which makes it difficult for requests with unusual distribution to get a ride. How to increase revenue while ensuring a reasonable assignment of requests brings a challenge to ride pooling service companies (aggregation companies). In this paper, we propose a framework for vehicle dispatching for ride pooling tasks, which splits the city into discrete dispatching regions and uses the reinforcement learning (RL) algorithm to dispatch vehicles in these regions. We also consider the mutual information (MI) between vehicle and order distribution as the intrinsic reward of the RL algorithm to improve the correlation between their distributions, thus ensuring the possibility of getting a ride for unusually distributed requests. In experimental results on a real-world taxi dataset, we demonstrate that our framework can significantly increase revenue up to an average of 3\% over the existing best on-demand ride pooling method.

Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

TL;DR

is estimated via a variational bound using an encoder, and the total reward becomes

to promote better distributional coupling. Experiments on a real taxi dataset (Manhattan) show consistent revenue improvements, with average gains around 3% over the best-known baselines, validating the approach's potential for practical deployment.

Abstract

Paper Structure (19 sections, 12 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 19 sections, 12 equations, 6 figures, 7 tables, 1 algorithm.

Introduction
Ride-pool Dispatching and Matching Problem (RDMP)
Matching Problem
Dispatching Problem
RL-based dispatching and matching framework
MFQL
Intrinsic Reward
Constraint Matching in Dispatched Region
Experimental Results
Setup
Dataset description
Simulation Engine
Baseline
Results
Conclusion
...and 4 more sections

Figures (6)

Figure 1: The Manhattan city area is divided into small dispatching regions by hexagonal grids. The grid is divided using the tool h3 (https://h3geo.org) with a resolution of 8.
Figure 3: The overall framework. (a) the process of computing the mutual information of vehicle and request distribution. (b) the training process of Q-learning.
Figure 4: The MI module is added to the DQN and the revenue curve of the results of the whole day running.
Figure 5: The curve is the mutual information value. We set the number of vehicles at 1000 and capacity at 4.
Figure 6: Compare the differences in the distribution of vehicles with and without MI modules for different algorithms. (a) and (b) are the distribution of requests. (c) is the difference in the distribution of vehicles, which is calculated by computing the distribution of vehicles under the DQN+MI algorithm minus the distribution under the DQN algorithm. (d) is under MFQL+MI minus MFQL algorithm.
...and 1 more figures

Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

TL;DR

Abstract

Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

Authors

TL;DR

Abstract

Table of Contents

Figures (6)