Table of Contents
Fetching ...

RL-MSA: a Reinforcement Learning-based Multi-line bus Scheduling Approach

Yingzhuo Liu

TL;DR

This work models the Multiple Line Bus Scheduling Problem (MLBSP) as a Markov Decision Process (MDP) to address uncertainty in bus operations. It introduces RL-MSA, a reinforcement learning-based approach that operates in offline and online phases, using a time-window mechanism to transfer offline policy to online deadheading and novel state features with a bus priority screening to reduce dimensionality. The method employs Proximal Policy Optimization (PPO) with a joint final+step-wise reward to learn effective policies, achieving fewer buses $N_u$ and lower total deadhead $T_d$ offline, and maintaining full timetable coverage online without increasing fleet size under disturbances. Experimental results on real and synthetic MLBSP instances show RL-MSA outperforms the state-of-the-art ALNS offline and demonstrates robust online adaptation, underscoring its practical impact for cost reduction and service reliability in multi-line bus networks.

Abstract

Multiple Line Bus Scheduling Problem (MLBSP) is vital to save operational cost of bus company and guarantee service quality for passengers. Existing approaches typically generate a bus scheduling scheme in an offline manner and then schedule buses according to the scheme. In practice, uncertain events such as traffic congestion occur frequently, which may make the pre-determined bus scheduling scheme infeasible. In this paper, MLBSP is modeled as a Markov Decision Process (MDP). A Reinforcement Learning-based Multi-line bus Scheduling Approach (RL-MSA) is proposed for bus scheduling at both the offline and online phases. At the offline phase, deadhead decision is integrated into bus selection decision for the first time to simplify the learning problem. At the online phase, deadhead decision is made through a time window mechanism based on the policy learned at the offline phase. We develop several new and useful state features including the features for control points, bus lines and buses. A bus priority screening mechanism is invented to construct bus-related features. Considering the interests of both the bus company and passengers, a reward function combining the final reward and the step-wise reward is devised. Experiments at the offline phase demonstrate that the number of buses used of RL-MSA is decreased compared with offline optimization approaches. At the online phase, RL-MSA can cover all departure times in a timetable (i.e., service quality) without increasing the number of buses used (i.e., operational cost).

RL-MSA: a Reinforcement Learning-based Multi-line bus Scheduling Approach

TL;DR

This work models the Multiple Line Bus Scheduling Problem (MLBSP) as a Markov Decision Process (MDP) to address uncertainty in bus operations. It introduces RL-MSA, a reinforcement learning-based approach that operates in offline and online phases, using a time-window mechanism to transfer offline policy to online deadheading and novel state features with a bus priority screening to reduce dimensionality. The method employs Proximal Policy Optimization (PPO) with a joint final+step-wise reward to learn effective policies, achieving fewer buses and lower total deadhead offline, and maintaining full timetable coverage online without increasing fleet size under disturbances. Experimental results on real and synthetic MLBSP instances show RL-MSA outperforms the state-of-the-art ALNS offline and demonstrates robust online adaptation, underscoring its practical impact for cost reduction and service reliability in multi-line bus networks.

Abstract

Multiple Line Bus Scheduling Problem (MLBSP) is vital to save operational cost of bus company and guarantee service quality for passengers. Existing approaches typically generate a bus scheduling scheme in an offline manner and then schedule buses according to the scheme. In practice, uncertain events such as traffic congestion occur frequently, which may make the pre-determined bus scheduling scheme infeasible. In this paper, MLBSP is modeled as a Markov Decision Process (MDP). A Reinforcement Learning-based Multi-line bus Scheduling Approach (RL-MSA) is proposed for bus scheduling at both the offline and online phases. At the offline phase, deadhead decision is integrated into bus selection decision for the first time to simplify the learning problem. At the online phase, deadhead decision is made through a time window mechanism based on the policy learned at the offline phase. We develop several new and useful state features including the features for control points, bus lines and buses. A bus priority screening mechanism is invented to construct bus-related features. Considering the interests of both the bus company and passengers, a reward function combining the final reward and the step-wise reward is devised. Experiments at the offline phase demonstrate that the number of buses used of RL-MSA is decreased compared with offline optimization approaches. At the online phase, RL-MSA can cover all departure times in a timetable (i.e., service quality) without increasing the number of buses used (i.e., operational cost).
Paper Structure (22 sections, 8 equations, 9 figures, 8 tables)

This paper contains 22 sections, 8 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: An example of a bus travelling
  • Figure 2: Framework of RL-MSA
  • Figure 3: Actions of MLBSP
  • Figure 4: State features of MLBSP
  • Figure 5: A bus scheduling scheme generated by RL-MSA for Real-1
  • ...and 4 more figures