Online Prediction-Assisted Safe Reinforcement Learning for Electric Vehicle Charging Station Recommendation in Dynamically Coupled Transportation-Power Systems
Qionghua Liao, Guilong Li, Jiajie Yu, Ziyuan Gu, Wei Ma
TL;DR
This work formulates en-route EV charging station recommendation as a constrained Markov decision process to jointly optimize traffic efficiency and power grid safety in dynamically coupled transportation-power systems. It introduces Online Prediction-Assisted Safe Reinforcement Learning (OP-SRL), which leverages a Lagrangian-based PPO framework and an online Seq2Seq predictor to handle long-term constraints and delays between CS guidance and charging. Through extensive case studies on Nguyen-Dupuis network with IEEE 33-bus and a large real-world Kowloon network with IEEE 69-bus, OP-SRL consistently outperforms baselines in Total Travel Time $TTT$, Cumulative Voltage Violation $CVV$, and Waiting+Charging Time $WCT$, while demonstrating robustness to EV penetration, controller interval, and predictor design. The results underscore the value of system-level coupling, adaptive constraint handling, and forward-looking state augmentation for practical, scalable CS guidance in urban power-plus-transport infrastructure.
Abstract
With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. Existing literature largely overlooks the interactions between power grid security and traffic efficiency. In view of this, we study the en-route charging station (CS) recommendation problem for EVs in dynamically coupled transportation-power systems. The system-level objective is to maximize the overall traffic efficiency while ensuring the safety of the power grid. This problem is for the first time formulated as a constrained Markov decision process (CMDP), and an online prediction-assisted safe reinforcement learning (OP-SRL) method is proposed to learn the optimal and secure policy by extending the PPO method. To be specific, we mainly address two challenges. First, the constrained optimization problem is converted into an equivalent unconstrained optimization problem by applying the Lagrangian method. Second, to account for the uncertain long-time delay between performing CS recommendation and commencing charging, we put forward an online sequence-to-sequence (Seq2Seq) predictor for state augmentation to guide the agent in making forward-thinking decisions. Finally, we conduct comprehensive experimental studies based on the Nguyen-Dupuis network and a large-scale real-world road network, coupled with IEEE 33-bus and IEEE 69-bus distribution systems, respectively. Results demonstrate that the proposed method outperforms baselines in terms of road network efficiency, power grid safety, and EV user satisfaction. The case study on the real-world network also illustrates the applicability in the practical context.
