RL-MSA: a Reinforcement Learning-based Multi-line bus Scheduling Approach

Yingzhuo Liu

RL-MSA: a Reinforcement Learning-based Multi-line bus Scheduling Approach

Yingzhuo Liu

TL;DR

This work models the Multiple Line Bus Scheduling Problem (MLBSP) as a Markov Decision Process (MDP) to address uncertainty in bus operations. It introduces RL-MSA, a reinforcement learning-based approach that operates in offline and online phases, using a time-window mechanism to transfer offline policy to online deadheading and novel state features with a bus priority screening to reduce dimensionality. The method employs Proximal Policy Optimization (PPO) with a joint final+step-wise reward to learn effective policies, achieving fewer buses $N_u$ and lower total deadhead $T_d$ offline, and maintaining full timetable coverage online without increasing fleet size under disturbances. Experimental results on real and synthetic MLBSP instances show RL-MSA outperforms the state-of-the-art ALNS offline and demonstrates robust online adaptation, underscoring its practical impact for cost reduction and service reliability in multi-line bus networks.

Abstract

Multiple Line Bus Scheduling Problem (MLBSP) is vital to save operational cost of bus company and guarantee service quality for passengers. Existing approaches typically generate a bus scheduling scheme in an offline manner and then schedule buses according to the scheme. In practice, uncertain events such as traffic congestion occur frequently, which may make the pre-determined bus scheduling scheme infeasible. In this paper, MLBSP is modeled as a Markov Decision Process (MDP). A Reinforcement Learning-based Multi-line bus Scheduling Approach (RL-MSA) is proposed for bus scheduling at both the offline and online phases. At the offline phase, deadhead decision is integrated into bus selection decision for the first time to simplify the learning problem. At the online phase, deadhead decision is made through a time window mechanism based on the policy learned at the offline phase. We develop several new and useful state features including the features for control points, bus lines and buses. A bus priority screening mechanism is invented to construct bus-related features. Considering the interests of both the bus company and passengers, a reward function combining the final reward and the step-wise reward is devised. Experiments at the offline phase demonstrate that the number of buses used of RL-MSA is decreased compared with offline optimization approaches. At the online phase, RL-MSA can cover all departure times in a timetable (i.e., service quality) without increasing the number of buses used (i.e., operational cost).

RL-MSA: a Reinforcement Learning-based Multi-line bus Scheduling Approach

TL;DR

and lower total deadhead

offline, and maintaining full timetable coverage online without increasing fleet size under disturbances. Experimental results on real and synthetic MLBSP instances show RL-MSA outperforms the state-of-the-art ALNS offline and demonstrates robust online adaptation, underscoring its practical impact for cost reduction and service reliability in multi-line bus networks.

Abstract

Paper Structure (22 sections, 8 equations, 9 figures, 8 tables)

This paper contains 22 sections, 8 equations, 9 figures, 8 tables.

Introduction
Related works
Bus scheduling approaches for offline phase
Bus scheduling approaches for online phase
Conclusion
Bus scheduling problem
Reinforcement learning-based bus scheduling approach
MDP model of RL-MSA
State space
Action space
Reward function
Deep RL agent of RL-MSA
Network structure
RL algorithm
Experimental results
...and 7 more sections

Figures (9)

Figure 1: An example of a bus travelling
Figure 2: Framework of RL-MSA
Figure 3: Actions of MLBSP
Figure 4: State features of MLBSP
Figure 5: A bus scheduling scheme generated by RL-MSA for Real-1
...and 4 more figures

RL-MSA: a Reinforcement Learning-based Multi-line bus Scheduling Approach

TL;DR

Abstract

RL-MSA: a Reinforcement Learning-based Multi-line bus Scheduling Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (9)