A Multiobjective Reinforcement Learning Framework for Microgrid Energy Management
M. Vivienne Liu, Patrick M. Reed, David Gold, Garret Quist, C. Lindsay Anderson
TL;DR
This paper tackles the problem of conflicting objectives in microgrid energy management by introducing a model-free multi-objective reinforcement learning (MORL) framework that explicitly searches the high-dimensional objective space with the Borg MOEA. A parametric policy, implemented as a multi-input, multi-output neural network, coordinates generation and steam systems under uncertainty using exogenous information without relying on long-horizon forecasts. The approach is demonstrated on the Cornell CU-MG CHP microgrid, yielding a diverse Pareto frontier and showing that representative policies can reduce emissions by up to 20–25% in winter and summer without extra cost, while also increasing interpretability through time-varying sensitivity analysis. The work provides a practical, data-driven method to navigate complex MG tradeoffs and offers actionable insights into how exogenous information influences adaptive, coordinated control. Overall, the MORL framework advances MG energy management by delivering Pareto-rich policy sets, interpretability, and robust performance under uncertainty.
Abstract
The emergence of microgrids (MGs) has provided a promising solution for decarbonizing and decentralizing the power grid, mitigating the challenges posed by climate change. However, MG operations often involve considering multiple objectives that represent the interests of different stakeholders, leading to potentially complex conflicts. To tackle this issue, we propose a novel multi-objective reinforcement learning framework that explores the high-dimensional objective space and uncovers the tradeoffs between conflicting objectives. This framework leverages exogenous information and capitalizes on the data-driven nature of reinforcement learning, enabling the training of a parametric policy without the need for long-term forecasts or knowledge of the underlying uncertainty distribution. The trained policies exhibit diverse, adaptive, and coordinative behaviors with the added benefit of providing interpretable insights on the dynamics of their information use. We employ this framework on the Cornell University MG (CU-MG), which is a combined heat and power MG, to evaluate its effectiveness. The results demonstrate performance improvements in all objectives considered compared to the status quo operations and offer more flexibility in navigating complex operational tradeoffs.
