Table of Contents
Fetching ...

Safe and Sustainable Electric Bus Charging Scheduling with Constrained Hierarchical DRL

Jiaju Qi, Lei Lei, Thorsteinn Jonsson, Dusit Niyato

TL;DR

The paper tackles safe and economical charging of electric bus fleets under PV and price uncertainty by formulating the problem as a constrained Markov decision process with temporal abstraction (options). It introduces a novel DAC-MAPPO-Lagrangian algorithm that combines a centralized high-level PPO-Lagrangian policy for charger allocation with decentralized low-level MAPPO-Lagrangian policies for per-EB charging, trained under a CTDE framework. Empirical results with real-world PV and price data show that the proposed method approaches the performance of an oracle MILP solution while significantly reducing safety violations and improving convergence stability, especially as fleet size grows. The work demonstrates the practical viability of safe HDRL for uncertainty-aware transportation systems and provides a principled mechanism to balance cost and safety without manual penalty tuning.

Abstract

The integration of Electric Buses (EBs) with renewable energy sources such as photovoltaic (PV) panels is a promising approach to promote sustainable and low-carbon public transportation. However, optimizing EB charging schedules to minimize operational costs while ensuring safe operation without battery depletion remains challenging - especially under real-world conditions, where uncertainties in PV generation, dynamic electricity prices, variable travel times, and limited charging infrastructure must be accounted for. In this paper, we propose a safe Hierarchical Deep Reinforcement Learning (HDRL) framework for solving the EB Charging Scheduling Problem (EBCSP) under multi-source uncertainties. We formulate the problem as a Constrained Markov Decision Process (CMDP) with options to enable temporally abstract decision-making. We develop a novel HDRL algorithm, namely Double Actor-Critic Multi-Agent Proximal Policy Optimization Lagrangian (DAC-MAPPO-Lagrangian), which integrates Lagrangian relaxation into the Double Actor-Critic (DAC) framework. At the high level, we adopt a centralized PPO-Lagrangian algorithm to learn safe charger allocation policies. At the low level, we incorporate MAPPO-Lagrangian to learn decentralized charging power decisions under the Centralized Training and Decentralized Execution (CTDE) paradigm. Extensive experiments with real-world data demonstrate that the proposed approach outperforms existing baselines in both cost minimization and safety compliance, while maintaining fast convergence speed.

Safe and Sustainable Electric Bus Charging Scheduling with Constrained Hierarchical DRL

TL;DR

The paper tackles safe and economical charging of electric bus fleets under PV and price uncertainty by formulating the problem as a constrained Markov decision process with temporal abstraction (options). It introduces a novel DAC-MAPPO-Lagrangian algorithm that combines a centralized high-level PPO-Lagrangian policy for charger allocation with decentralized low-level MAPPO-Lagrangian policies for per-EB charging, trained under a CTDE framework. Empirical results with real-world PV and price data show that the proposed method approaches the performance of an oracle MILP solution while significantly reducing safety violations and improving convergence stability, especially as fleet size grows. The work demonstrates the practical viability of safe HDRL for uncertainty-aware transportation systems and provides a principled mechanism to balance cost and safety without manual penalty tuning.

Abstract

The integration of Electric Buses (EBs) with renewable energy sources such as photovoltaic (PV) panels is a promising approach to promote sustainable and low-carbon public transportation. However, optimizing EB charging schedules to minimize operational costs while ensuring safe operation without battery depletion remains challenging - especially under real-world conditions, where uncertainties in PV generation, dynamic electricity prices, variable travel times, and limited charging infrastructure must be accounted for. In this paper, we propose a safe Hierarchical Deep Reinforcement Learning (HDRL) framework for solving the EB Charging Scheduling Problem (EBCSP) under multi-source uncertainties. We formulate the problem as a Constrained Markov Decision Process (CMDP) with options to enable temporally abstract decision-making. We develop a novel HDRL algorithm, namely Double Actor-Critic Multi-Agent Proximal Policy Optimization Lagrangian (DAC-MAPPO-Lagrangian), which integrates Lagrangian relaxation into the Double Actor-Critic (DAC) framework. At the high level, we adopt a centralized PPO-Lagrangian algorithm to learn safe charger allocation policies. At the low level, we incorporate MAPPO-Lagrangian to learn decentralized charging power decisions under the Centralized Training and Decentralized Execution (CTDE) paradigm. Extensive experiments with real-world data demonstrate that the proposed approach outperforms existing baselines in both cost minimization and safety compliance, while maintaining fast convergence speed.

Paper Structure

This paper contains 36 sections, 2 theorems, 54 equations, 8 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

The high-level reward value function$V_{\pi_{\mathrm{H}}}^{\mathrm{R}}$ and cost value function$V_{\pi_\mathrm{H}}^{\rm C}$ can be expressed in terms of the corresponding low-level reward value function$V_{\pi_{\mathrm{L}}}^{\mathrm{R}}$ and cost value function$V_{\pi_\mathrm{L}}^{\rm C}$ through th

Figures (8)

  • Figure 1: The schematic of our system model.
  • Figure 2: A figure to illustrate the layover and operating periods for EB $m$ and the definitions of $B_{m,t}$, $\tau_{m,t}$, $k_{m,t}$, and $\varXi_{m,t}$.
  • Figure 3: Framework of our proposed algorithm.
  • Figure 4: The variation of the electricity prices of a typical day in the experimental data.
  • Figure 5: The variation of the PV power of a typical day in the experimental data.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Remark 1: Relation between CMDP and Options over CMDP
  • Definition 1: High-Level CMDP
  • Definition 2: Low-Level CMDP
  • Proposition 1
  • Remark 2: Relation between Options over CMDP and two augmented CMDPs
  • Definition 3
  • Theorem 1
  • proof
  • proof