Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids
Hadi Nekoei, Alexandre Blondin Massé, Rachid Hassani, Sarath Chandar, Vincent Mai
TL;DR
This work introduces Shielded Controller Units (SCUs), a white-box shielding framework that decomposes complex constrained environments into hierarchical shields to guarantee constraint satisfaction for reinforcement learning agents. SCUs couple controllers to real systems via digital twins and shielded dispatchers, enabling interpretable, provable safety in industrial settings. Applied to a remote microgrid with wind, battery, and gensets, the approach achieves about a 24% reduction in fuel consumption without increasing battery degradation, while strictly honoring all constraints. The results demonstrate the practicality of deploying RL in energy-transition contexts and suggest broad applicability to other critical sectors requiring interpretable safety guarantees.
Abstract
Reinforcement learning (RL) is a powerful framework for optimizing decision-making in complex systems under uncertainty, an essential challenge in real-world settings, particularly in the context of the energy transition. A representative example is remote microgrids that supply power to communities disconnected from the main grid. Enabling the energy transition in such systems requires coordinated control of renewable sources like wind turbines, alongside fuel generators and batteries, to meet demand while minimizing fuel consumption and battery degradation under exogenous and intermittent load and wind conditions. These systems must often conform to extensive regulations and complex operational constraints. To ensure that RL agents respect these constraints, it is crucial to provide interpretable guarantees. In this paper, we introduce Shielded Controller Units (SCUs), a systematic and interpretable approach that leverages prior knowledge of system dynamics to ensure constraint satisfaction. Our shield synthesis methodology, designed for real-world deployment, decomposes the environment into a hierarchical structure where each SCU explicitly manages a subset of constraints. We demonstrate the effectiveness of SCUs on a remote microgrid optimization task with strict operational requirements. The RL agent, equipped with SCUs, achieves a 24% reduction in fuel consumption without increasing battery degradation, outperforming other baselines while satisfying all constraints. We hope SCUs contribute to the safe application of RL to the many decision-making challenges linked to the energy transition.
