Table of Contents
Fetching ...

Large Language Model-Enhanced Reinforcement Learning for Generic Bus Holding Control Strategies

Jiajie Yu, Yuhong Wang, Wei Ma

TL;DR

The paper tackles reward sparsity and transferability in bus holding control by introducing an LLM-enhanced RL framework that自动atically designs dense reward functions. It decomposes this into reward initializer, modifier, analyzer, and refiner modules, enabling iterative improvement with stability safeguards. Across synthetic and real-world multi-line scenarios, the GPT-family LLMs, particularly GPT-4 and GPT-4o, deliver superior average travel time and headway stability compared to vanilla RL, LLM-only controllers, and model-based or optimization baselines, while maintaining robustness to demand fluctuations and shared-passenger dynamics. The results demonstrate strong generalization to unseen networks and highlight potential extensions to other smart-mobility tasks, underscoring the value of integrating LLMs for reward design in complex control systems.

Abstract

Bus holding control is a widely-adopted strategy for maintaining stability and improving the operational efficiency of bus systems. Traditional model-based methods often face challenges with the low accuracy of bus state prediction and passenger demand estimation. In contrast, Reinforcement Learning (RL), as a data-driven approach, has demonstrated great potential in formulating bus holding strategies. RL determines the optimal control strategies in order to maximize the cumulative reward, which reflects the overall control goals. However, translating sparse and delayed control goals in real-world tasks into dense and real-time rewards for RL is challenging, normally requiring extensive manual trial-and-error. In view of this, this study introduces an automatic reward generation paradigm by leveraging the in-context learning and reasoning capabilities of Large Language Models (LLMs). This new paradigm, termed the LLM-enhanced RL, comprises several LLM-based modules: reward initializer, reward modifier, performance analyzer, and reward refiner. These modules cooperate to initialize and iteratively improve the reward function according to the feedback from training and test results for the specified RL-based task. Ineffective reward functions generated by the LLM are filtered out to ensure the stable evolution of the RL agents' performance over iterations. To evaluate the feasibility of the proposed LLM-enhanced RL paradigm, it is applied to extensive bus holding control scenarios that vary in the number of bus lines, stops, and passenger demand. The results demonstrate the superiority, generalization capability, and robustness of the proposed paradigm compared to vanilla RL strategies, the LLM-based controller, physics-based feedback controllers, and optimization-based controllers. This study sheds light on the great potential of utilizing LLMs in various smart mobility applications.

Large Language Model-Enhanced Reinforcement Learning for Generic Bus Holding Control Strategies

TL;DR

The paper tackles reward sparsity and transferability in bus holding control by introducing an LLM-enhanced RL framework that自动atically designs dense reward functions. It decomposes this into reward initializer, modifier, analyzer, and refiner modules, enabling iterative improvement with stability safeguards. Across synthetic and real-world multi-line scenarios, the GPT-family LLMs, particularly GPT-4 and GPT-4o, deliver superior average travel time and headway stability compared to vanilla RL, LLM-only controllers, and model-based or optimization baselines, while maintaining robustness to demand fluctuations and shared-passenger dynamics. The results demonstrate strong generalization to unseen networks and highlight potential extensions to other smart-mobility tasks, underscoring the value of integrating LLMs for reward design in complex control systems.

Abstract

Bus holding control is a widely-adopted strategy for maintaining stability and improving the operational efficiency of bus systems. Traditional model-based methods often face challenges with the low accuracy of bus state prediction and passenger demand estimation. In contrast, Reinforcement Learning (RL), as a data-driven approach, has demonstrated great potential in formulating bus holding strategies. RL determines the optimal control strategies in order to maximize the cumulative reward, which reflects the overall control goals. However, translating sparse and delayed control goals in real-world tasks into dense and real-time rewards for RL is challenging, normally requiring extensive manual trial-and-error. In view of this, this study introduces an automatic reward generation paradigm by leveraging the in-context learning and reasoning capabilities of Large Language Models (LLMs). This new paradigm, termed the LLM-enhanced RL, comprises several LLM-based modules: reward initializer, reward modifier, performance analyzer, and reward refiner. These modules cooperate to initialize and iteratively improve the reward function according to the feedback from training and test results for the specified RL-based task. Ineffective reward functions generated by the LLM are filtered out to ensure the stable evolution of the RL agents' performance over iterations. To evaluate the feasibility of the proposed LLM-enhanced RL paradigm, it is applied to extensive bus holding control scenarios that vary in the number of bus lines, stops, and passenger demand. The results demonstrate the superiority, generalization capability, and robustness of the proposed paradigm compared to vanilla RL strategies, the LLM-based controller, physics-based feedback controllers, and optimization-based controllers. This study sheds light on the great potential of utilizing LLMs in various smart mobility applications.

Paper Structure

This paper contains 41 sections, 14 equations, 19 figures, 5 tables, 2 algorithms.

Figures (19)

  • Figure 1: LLMs' role in enhancing RL-based control methods
  • Figure 2: LLM-enhanced RL paradigm
  • Figure 3: Example of reward function modification
  • Figure 4: Reward curves of trained RL agents.
  • Figure 5: Bus lines in case 1
  • ...and 14 more figures