A Reinforcement Learning Engine with Reduced Action and State Space for Scalable Cyber-Physical Optimal Response

Shining Sun; Khandaker Akramul Haque; Xiang Huo; Leen Al Homoud; Shamina Hossain-McKenzie; Ana Goulart; Katherine Davis

A Reinforcement Learning Engine with Reduced Action and State Space for Scalable Cyber-Physical Optimal Response

Shining Sun, Khandaker Akramul Haque, Xiang Huo, Leen Al Homoud, Shamina Hossain-McKenzie, Ana Goulart, Katherine Davis

TL;DR

The paper tackles the challenge of scalable, optimal responses for cyber-physical power systems under disturbances, notably DoS attacks. It introduces RL-RID-GridResponder, a reinforcement learning engine augmented with Role and Interaction Discovery (RID) to shrink action and state spaces by identifying essential, critical, and redundant controllers and fusing cyber-physical data. The approach uses multimodal data fusion (PCA then $t$-SNE), a state-evaluation module, RID, and policy-based RL (PPO and A2C) to perform Volt-Var control under DoS conditions, validated on augmented WSCC 9-bus and IEEE 24-bus test systems within the RESLab/PowerGym/OpenDSS environment. Results show PPO with RID achieves faster convergence and maintains voltages within $\pm 5\%$ while reducing the action space by about $15$–$17\%$, illustrating practical improvements in resilience and real-time operation of large-scale CPS power systems.

Abstract

Numerous research studies have been conducted to enhance the resilience of cyber-physical systems (CPSs) by detecting potential cyber or physical disturbances. However, the development of scalable and optimal response measures under power system contingency based on fusing cyber-physical data is still in an early stage. To address this research gap, this paper introduces a power system response engine based on reinforcement learning (RL) and role and interaction discovery (RID) techniques. RL-RID-GridResponder is designed to automatically detect the contingency and assist with the decision-making process to ensure optimal power system operation. The RL-RID-GridResponder learns via an RL-based structure and achieves enhanced scalability by integrating an RID module with reduced action and state spaces. The applicability of RL-RID-GridResponder in providing scalable and optimal responses for CPSs is demonstrated on power systems in the context of Denial of Service (DoS) attacks. Moreover, simulations are conducted on a Volt-Var regulation problem using the augmented WSCC 9-bus and augmented IEEE 24-bus systems based on fused cyber and physical data sets. The results show that the proposed RL-RID-GridResponder can provide fast and accurate responses to ensure optimal power system operation under DoS and can extend to other system contingencies such as line outages and loss of loads.

A Reinforcement Learning Engine with Reduced Action and State Space for Scalable Cyber-Physical Optimal Response

TL;DR

-SNE), a state-evaluation module, RID, and policy-based RL (PPO and A2C) to perform Volt-Var control under DoS conditions, validated on augmented WSCC 9-bus and IEEE 24-bus test systems within the RESLab/PowerGym/OpenDSS environment. Results show PPO with RID achieves faster convergence and maintains voltages within

while reducing the action space by about

–

, illustrating practical improvements in resilience and real-time operation of large-scale CPS power systems.

Abstract

Paper Structure (22 sections, 13 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 13 equations, 9 figures, 3 tables, 1 algorithm.

Introduction
Preliminaries
Data Security in CPS Control
Threat Model
Testbed Emulation
Design of the RL-based Scalable Optimal Response Engine
Data Fusion Module
State Evaluation Module
Role and Interaction Discovery Module
Reinforcement Learning Module
Actions, States, and Rewards
PPO and A2C
Human Machine Interface Module
Experimental Results
Simulation Environment
...and 7 more sections

Figures (9)

Figure 1: Framework of the proposed RL-based scalable optimal response engine for cyber-physical power systems.
Figure 2: Augmented WSCC 9-bus system.
Figure 3: Comparison of RL results for PPO and A2C with/without the DoS, showing that both PPO and A2C converged while the PPO outperformed the A2C within fewer oscillations and better rewards.
Figure 4: Bus voltages of the augmented WSCC 9-bus system presented via a heatmap (trained with PPO, where 'cap', 'bat', and 'reg' denote capacitor, battery, and the tap of the transformer, respectively). All buses' voltages are kept within the ±5% fluctuation.
Figure 5: Augmented IEEE 24-bus system (with additional capacitors and batteries).
...and 4 more figures

A Reinforcement Learning Engine with Reduced Action and State Space for Scalable Cyber-Physical Optimal Response

TL;DR

Abstract

A Reinforcement Learning Engine with Reduced Action and State Space for Scalable Cyber-Physical Optimal Response

Authors

TL;DR

Abstract

Table of Contents

Figures (9)