Dual Policy Reinforcement Learning for Real-time Rebalancing in Bike-sharing Systems
Jiaqi Liang, Defeng Liu, Sanjay Dominik Jena, Andrea Lodi, Thibaut Vidal
TL;DR
The paper tackles real-time rebalancing in bike-sharing by formulating the problem as a continuous-time multi-agent MMDP and solving it with a dual-policy reinforcement learning framework. It introduces DPRL, which trains two DQNs to separately learn inventory and routing policies, enabling decisions that account for system evolution during operations. Experiments on synthetic GT1/GT2 datasets show substantial reductions in lost demand compared with MIP baselines and single-policy RL methods, highlighting improvements in responsiveness and scalability. The work provides practical insights for operators and establishes a path toward more intelligent, robust urban mobility solutions.
Abstract
Bike-sharing systems play a crucial role in easing traffic congestion and promoting healthier lifestyles. However, ensuring their reliability and user acceptance requires effective strategies for rebalancing bikes. This study introduces a novel approach to address the real-time rebalancing problem with a fleet of vehicles. It employs a dual policy reinforcement learning algorithm that decouples inventory and routing decisions, enhancing realism and efficiency compared to previous methods where both decisions were made simultaneously. We first formulate the inventory and routing subproblems as a multi-agent Markov Decision Process within a continuous time framework. Subsequently, we propose a DQN-based dual policy framework to jointly estimate the value functions, minimizing the lost demand. To facilitate learning, a comprehensive simulator is applied to operate under a first-arrive-first-serve rule, which enables the computation of immediate rewards across diverse demand scenarios. We conduct extensive experiments on various datasets generated from historical real-world data, affected by both temporal and weather factors. Our proposed algorithm demonstrates significant performance improvements over previous baseline methods. It offers valuable practical insights for operators and further explores the incorporation of reinforcement learning into real-world dynamic programming problems, paving the way for more intelligent and robust urban mobility solutions.
