Dual Policy Reinforcement Learning for Real-time Rebalancing in Bike-sharing Systems

Jiaqi Liang; Defeng Liu; Sanjay Dominik Jena; Andrea Lodi; Thibaut Vidal

Dual Policy Reinforcement Learning for Real-time Rebalancing in Bike-sharing Systems

Jiaqi Liang, Defeng Liu, Sanjay Dominik Jena, Andrea Lodi, Thibaut Vidal

TL;DR

The paper tackles real-time rebalancing in bike-sharing by formulating the problem as a continuous-time multi-agent MMDP and solving it with a dual-policy reinforcement learning framework. It introduces DPRL, which trains two DQNs to separately learn inventory and routing policies, enabling decisions that account for system evolution during operations. Experiments on synthetic GT1/GT2 datasets show substantial reductions in lost demand compared with MIP baselines and single-policy RL methods, highlighting improvements in responsiveness and scalability. The work provides practical insights for operators and establishes a path toward more intelligent, robust urban mobility solutions.

Abstract

Bike-sharing systems play a crucial role in easing traffic congestion and promoting healthier lifestyles. However, ensuring their reliability and user acceptance requires effective strategies for rebalancing bikes. This study introduces a novel approach to address the real-time rebalancing problem with a fleet of vehicles. It employs a dual policy reinforcement learning algorithm that decouples inventory and routing decisions, enhancing realism and efficiency compared to previous methods where both decisions were made simultaneously. We first formulate the inventory and routing subproblems as a multi-agent Markov Decision Process within a continuous time framework. Subsequently, we propose a DQN-based dual policy framework to jointly estimate the value functions, minimizing the lost demand. To facilitate learning, a comprehensive simulator is applied to operate under a first-arrive-first-serve rule, which enables the computation of immediate rewards across diverse demand scenarios. We conduct extensive experiments on various datasets generated from historical real-world data, affected by both temporal and weather factors. Our proposed algorithm demonstrates significant performance improvements over previous baseline methods. It offers valuable practical insights for operators and further explores the incorporation of reinforcement learning into real-world dynamic programming problems, paving the way for more intelligent and robust urban mobility solutions.

Dual Policy Reinforcement Learning for Real-time Rebalancing in Bike-sharing Systems

TL;DR

Abstract

Paper Structure (19 sections, 3 equations, 9 figures, 8 tables)

This paper contains 19 sections, 3 equations, 9 figures, 8 tables.

Introduction
Related Work
Problem Formulation
The Dynamic Bike Repositioning Problem (DBRP)
Multi-agent Markov Decision Process (MMDP) Formulation
Dual Policy Reinforcement Learning
Tailored MMDP for Dual Policy
DPRL Pipeline
Computational Experiments
Dataset
Benchmarks
Results
Ablation Analysis
Conclusions
Appendix
...and 4 more sections

Figures (9)

Figure 1: Dynamic rebalancing in BSS and our dual policy approach: Station location and inventory information, vehicle location and inventory level, and user demand typically serve as inputs in dynamic rebalancing models. Here, we employ a dual policy to obtain rebalancing solution based on the environmental interactions among stations, vehicles, and users.
Figure 2: Dual policy framework
Figure 3: DPRL Pipeline for Real-time Rebalancing
Figure 4: Total averge lost demand on test set for GT1 and GT2
Figure 5: Episodic return during RL training for GT1 and GT2
...and 4 more figures

Dual Policy Reinforcement Learning for Real-time Rebalancing in Bike-sharing Systems

TL;DR

Abstract

Dual Policy Reinforcement Learning for Real-time Rebalancing in Bike-sharing Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (9)