Table of Contents
Fetching ...

Robust Defense Against Extreme Grid Events Using Dual-Policy Reinforcement Learning Agents

Benjamin M. Peter, Mert Korkali

TL;DR

To model multi-actor scenarios that threaten modern power networks, particularly those resulting from cyberattacks, an opponent that acts iteratively against a given agent is integrated in contingency screening, providing a novel alternative to the traditional security assessment.

Abstract

Reinforcement learning (RL) agents are powerful tools for managing power grids. They use large amounts of data to inform their actions and receive rewards or penalties as feedback to learn favorable responses for the system. Once trained, these agents can efficiently make decisions that would be too computationally complex for a human operator. This ability is especially valuable in decarbonizing power networks, where the demand for RL agents is increasing. These agents are well suited to control grid actions since the action space is constantly growing due to uncertainties in renewable generation, microgrid integration, and cybersecurity threats. To assess the efficacy of RL agents in response to an adverse grid event, we use the Grid2Op platform for agent training. We employ a proximal policy optimization (PPO) algorithm in conjunction with graph neural networks (GNNs). By simulating agents' responses to grid events, we assess their performance in avoiding grid failure for as long as possible. The performance of an agent is expressed concisely through its reward function, which helps the agent learn the most optimal ways to reconfigure a grid's topology amidst certain events. To model multi-actor scenarios that threaten modern power networks, particularly those resulting from cyberattacks, we integrate an opponent that acts iteratively against a given agent. This interplay between the RL agent and opponent is utilized in N-k contingency screening, providing a novel alternative to the traditional security assessment.

Robust Defense Against Extreme Grid Events Using Dual-Policy Reinforcement Learning Agents

TL;DR

To model multi-actor scenarios that threaten modern power networks, particularly those resulting from cyberattacks, an opponent that acts iteratively against a given agent is integrated in contingency screening, providing a novel alternative to the traditional security assessment.

Abstract

Reinforcement learning (RL) agents are powerful tools for managing power grids. They use large amounts of data to inform their actions and receive rewards or penalties as feedback to learn favorable responses for the system. Once trained, these agents can efficiently make decisions that would be too computationally complex for a human operator. This ability is especially valuable in decarbonizing power networks, where the demand for RL agents is increasing. These agents are well suited to control grid actions since the action space is constantly growing due to uncertainties in renewable generation, microgrid integration, and cybersecurity threats. To assess the efficacy of RL agents in response to an adverse grid event, we use the Grid2Op platform for agent training. We employ a proximal policy optimization (PPO) algorithm in conjunction with graph neural networks (GNNs). By simulating agents' responses to grid events, we assess their performance in avoiding grid failure for as long as possible. The performance of an agent is expressed concisely through its reward function, which helps the agent learn the most optimal ways to reconfigure a grid's topology amidst certain events. To model multi-actor scenarios that threaten modern power networks, particularly those resulting from cyberattacks, we integrate an opponent that acts iteratively against a given agent. This interplay between the RL agent and opponent is utilized in N-k contingency screening, providing a novel alternative to the traditional security assessment.

Paper Structure

This paper contains 14 sections, 18 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Episode visualization for a certain time step on the modified IEEE 14-bus test case.
  • Figure 2: Time steps survived for the NoAgent (left) and Agent (right) cases per $N-2$ contingency set.
  • Figure 3: Time steps survived for the NoAgent (left) and Agent (right) cases per $N-3$ contingency set.
  • Figure 4: The $\rho$ (Rho) values in selected lines across survived time steps for the NoAgent (top) and Agent (bottom) cases, respectively.
  • Figure 5: Steps survived and cascading failures for an $N-2$ scenario with a modified opponent for the NoAgent case (Note that for the Agent case, the lack of failures makes visualizing cascades irrelevant).