Table of Contents
Fetching ...

Discovery of False Data Injection Schemes on Frequency Controllers with Reinforcement Learning

Romesh Prasad, Malik Hassanaly, Xiangyu Zhang, Abhijeet Sahu

TL;DR

This paper tackles cybersecurity in frequency control for power systems with high inverter-based resource penetration by using reinforcement learning to discover false data injection (FDI) strategies targeting droop-based primary frequency control. By formulating the problem as an adversarial Markov decision process and applying PPO, the authors show that an RL agent can identify viable FDI actions that significantly amplify frequency disturbances on a 10-bus Kron-reduced IEEE New England system. The case study demonstrates that RL can surpass simple, time-invariant attack policies and uncover non-intuitive, potentially harmful strategies, highlighting the need for proactive cyber-defense measures. The findings have practical implications for prioritizing protective controls and designing defenses against sophisticated cyber-attacks in CPS-enabled power grids.

Abstract

While inverter-based distributed energy resources (DERs) play a crucial role in integrating renewable energy into the power system, they concurrently diminish the grid's system inertia, elevating the risk of frequency instabilities. Furthermore, smart inverters, interfaced via communication networks, pose a potential vulnerability to cyber threats if not diligently managed. To proactively fortify the power grid against sophisticated cyber attacks, we propose to employ reinforcement learning (RL) to identify potential threats and system vulnerabilities. This study concentrates on analyzing adversarial strategies for false data injection, specifically targeting smart inverters involved in primary frequency control. Our findings demonstrate that an RL agent can adeptly discern optimal false data injection methods to manipulate inverter settings, potentially causing catastrophic consequences.

Discovery of False Data Injection Schemes on Frequency Controllers with Reinforcement Learning

TL;DR

This paper tackles cybersecurity in frequency control for power systems with high inverter-based resource penetration by using reinforcement learning to discover false data injection (FDI) strategies targeting droop-based primary frequency control. By formulating the problem as an adversarial Markov decision process and applying PPO, the authors show that an RL agent can identify viable FDI actions that significantly amplify frequency disturbances on a 10-bus Kron-reduced IEEE New England system. The case study demonstrates that RL can surpass simple, time-invariant attack policies and uncover non-intuitive, potentially harmful strategies, highlighting the need for proactive cyber-defense measures. The findings have practical implications for prioritizing protective controls and designing defenses against sophisticated cyber-attacks in CPS-enabled power grids.

Abstract

While inverter-based distributed energy resources (DERs) play a crucial role in integrating renewable energy into the power system, they concurrently diminish the grid's system inertia, elevating the risk of frequency instabilities. Furthermore, smart inverters, interfaced via communication networks, pose a potential vulnerability to cyber threats if not diligently managed. To proactively fortify the power grid against sophisticated cyber attacks, we propose to employ reinforcement learning (RL) to identify potential threats and system vulnerabilities. This study concentrates on analyzing adversarial strategies for false data injection, specifically targeting smart inverters involved in primary frequency control. Our findings demonstrate that an RL agent can adeptly discern optimal false data injection methods to manipulate inverter settings, potentially causing catastrophic consequences.
Paper Structure (12 sections, 3 equations, 5 figures)

This paper contains 12 sections, 3 equations, 5 figures.

Figures (5)

  • Figure 1: IBRs following properly designed droop control to stabilize frequency.
  • Figure 2: An adversarial design process for vulnerability discovery and defense strategies development for adversarial events in CPS. The red box indicates the focus of this paper.
  • Figure 3: Cumulative reward obtained with a time-invariant policy for all 30 possible actions. $G_i$ denotes generator $i$.
  • Figure 4: Top: cumulative reward history as a function of the number of environment steps simulated for 3 training runs (P1 black matches the time-invariant reward, P2 darkspringgreen slightly exceeded it, and P3 blue reached the highest final reward) super-imposed time instance at which actions are recorded (red). Bottom: entropy loss history as a function of the number of environment steps simulated for the 3 training runs (black for P1, darkspringgreen for P2 and P3 blue).
  • Figure 5: Learned policy (left) and system response (right) after $7.2 \times 10^6$ steps. Top: results for P1 (reward of 584). Middle: results for P2 (reward of 590). Bottom: results for P3 (reward of 1085). Only the generators activated by the policy are shown. The system response frequencies $\omega_i$ are shown for each generator.