Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid
Eric MSP Veith, Torben Logemann, Aleksandr Berezin, Arlena Wellßow, Stephan Balduin
TL;DR
The paper tackles the challenge of learning efficient and reliable voltage-control policies for power grids using a hybrid reinforcement learning approach that fuses model-based reasoning with imitation learning and a safety-focused fallback controller. A discriminator gates between the learned policy and the fallback, guided by a world predictor, enabling faster training while maintaining grid-code compliance. Validation on a CIGRÉ medium-voltage benchmark shows that the hybrid ARL agent learns faster and avoids unsafe grid-code violations compared to a pure SAC agent, thanks to additional imitation-derived samples and robust fallback behavior. The work suggests practical benefits for deploying RL in cyber-physical energy systems, with future extensions to more complex networks, adversarial scenarios, and advanced world models to further enhance safety and resilience.
Abstract
Autonomous and learning systems based on Deep Reinforcement Learning have firmly established themselves as a foundation for approaches to creating resilient and efficient Cyber-Physical Energy Systems. However, most current approaches suffer from two distinct problems: Modern model-free algorithms such as Soft Actor Critic need a high number of samples to learn a meaningful policy, as well as a fallback to ward against concept drifts (e. g., catastrophic forgetting). In this paper, we present the work in progress towards a hybrid agent architecture that combines model-based Deep Reinforcement Learning with imitation learning to overcome both problems.
