Table of Contents
Fetching ...

LLM-Guided Safe Reinforcement Learning for Energy System Topology Reconfiguration

Zongyan Zhang, Chao Shen, Xu Wan, Jie Song, Mingyang Sun

Abstract

The increasing penetration of renewable generation and the growing variability of electrified demand introduce substantial operational uncertainty to modern power systems. Topology reconfiguration is widely recognized as an effective and economical means to enhance grid resilience. Due to the coexistence of AC power-flow constraints and discrete switching decisions, topology reconfiguration in large-scale systems leads to a highly nonlinear and nonconvex optimization problem, making traditional methods computationally prohibitive. Consequently, several studies have explored reinforcement learning-based approaches to improve scalability and operational efficiency. However, its practical implementation is challenged by the high-dimensional combinatorial action space and the need to ensure safety during learning-based decision-making. To address these challenges, this paper presents a safe and intelligent topology control framework that integrates Large Language Models (LLMs) with a Safety Soft Actor-Critic (Safety-SAC) architecture. Operational voltage and thermal limits are reformulated into smooth safety-cost signals, enabling risk-aware policy optimization within a constrained Markov decision process. A knowledge-based Safety-LLM module is further introduced to refine unsafe or suboptimal transitions through domain knowledge and state-informed reasoning, thus guiding the learning agent toward safer and more effective switching actions. Experiments on the IEEE 36-bus and 118-bus Grid2Op benchmarks show that the proposed method consistently improves reward, survival time, and safety metrics, achieving higher reward, longer survival, and lower safety cost compared with SAC, ACE, and their safety-enhanced variants. These results demonstrate the potential of combining LLM-based reasoning with safe reinforcement learning to achieve scalable and reliable grid topology control.

LLM-Guided Safe Reinforcement Learning for Energy System Topology Reconfiguration

Abstract

The increasing penetration of renewable generation and the growing variability of electrified demand introduce substantial operational uncertainty to modern power systems. Topology reconfiguration is widely recognized as an effective and economical means to enhance grid resilience. Due to the coexistence of AC power-flow constraints and discrete switching decisions, topology reconfiguration in large-scale systems leads to a highly nonlinear and nonconvex optimization problem, making traditional methods computationally prohibitive. Consequently, several studies have explored reinforcement learning-based approaches to improve scalability and operational efficiency. However, its practical implementation is challenged by the high-dimensional combinatorial action space and the need to ensure safety during learning-based decision-making. To address these challenges, this paper presents a safe and intelligent topology control framework that integrates Large Language Models (LLMs) with a Safety Soft Actor-Critic (Safety-SAC) architecture. Operational voltage and thermal limits are reformulated into smooth safety-cost signals, enabling risk-aware policy optimization within a constrained Markov decision process. A knowledge-based Safety-LLM module is further introduced to refine unsafe or suboptimal transitions through domain knowledge and state-informed reasoning, thus guiding the learning agent toward safer and more effective switching actions. Experiments on the IEEE 36-bus and 118-bus Grid2Op benchmarks show that the proposed method consistently improves reward, survival time, and safety metrics, achieving higher reward, longer survival, and lower safety cost compared with SAC, ACE, and their safety-enhanced variants. These results demonstrate the potential of combining LLM-based reasoning with safe reinforcement learning to achieve scalable and reliable grid topology control.
Paper Structure (45 sections, 31 equations, 21 figures, 5 tables, 1 algorithm)

This paper contains 45 sections, 31 equations, 21 figures, 5 tables, 1 algorithm.

Figures (21)

  • Figure 1: Method Overview.The framework first filters and refines transitions using a reward threshold, then leverages the LLM to correct unsafe samples, and finally trains the Safety-SAC agent on the updated buffer to improve both performance and safety.
  • Figure 2: Prompt template used for power grid control. Blue-colored tokens denote variables extracted from the replay buffer at timestep $D_t$, including observations, overload indicators, voltage measurements, and RL-derived action feedback.
  • Figure 3: LLM-based transition refinement mechanism. The Safety-SAC agent generates candidate actions, which are selectively refined by the LLM under reward-based triggering conditions.Parsing and feasibility checks ensure that only valid actions are executed.
  • Figure 4: Topology diagram of the IEEE 36-bus system.
  • Figure 5: Topology diagram of the IEEE 118-bus system.
  • ...and 16 more figures