Policy Gradient-Based EMT-in-the-Loop Learning to Mitigate Sub-Synchronous Control Interactions
Sayak Mukherjee, Ramij R. Hossain, Kaustav Chatterjee, Sameer Nekkalapu, Marcelo Elizondo
TL;DR
Addressing SSCIs caused by mis-tuned inverter controls under specific grid configurations, the paper develops an EMT-in-the-loop framework that learns adaptive outer/inner gains using a simple deep policy-gradient RL agent embedded in PSCAD. The approach formulates SSCI mitigation as an MDP with continuous state and action spaces, employs SSCI-specific data processing (down-sampling and band-pass filtering) to form observation windows, and uses an energy-based reward $R=-E_{osc}$ where $E_{osc}=\int_0^{T_w} (P_f(\tau)-P_{nom})^2 d\tau$. The utility is demonstrated on a real-world Texas SSCI scenario, showing the policy gradually reduces oscillation energy and suppresses detrimental dynamics when deployed mid-event. The work highlights practical gains in adaptive damping for inverter-rich grids and reduces EMT simulation burden via restricted action exploration.
Abstract
This paper explores the development of learning-based tunable control gains using EMT-in-the-loop simulation framework (e.g., PSCAD interfaced with Python-based learning modules) to address critical sub-synchronous oscillations. Since sub-synchronous control interactions (SSCI) arise from the mis-tuning of control gains under specific grid configurations, effective mitigation strategies require adaptive re-tuning of these gains. Such adaptiveness can be achieved by employing a closed-loop, learning-based framework that considers the grid conditions responsible for such sub-synchronous oscillations. This paper addresses this need by adopting methodologies inspired by Markov decision process (MDP) based reinforcement learning (RL), with a particular emphasis on simpler deep policy gradient methods with additional SSCI-specific signal processing modules such as down-sampling, bandpass filtering, and oscillation energy dependent reward computations. Our experimentation in a real-world event setting demonstrates that the deep policy gradient based trained policy can adaptively compute gain settings in response to varying grid conditions and optimally suppress control interaction-induced oscillations.
