Table of Contents
Fetching ...

Reinforcement Learning for Efficient Design and Control Co-optimisation of Energy Systems

Marine Cauz, Adrien Bolland, Nicolas Wyrsch, Christophe Ballif

TL;DR

The paper tackles the challenge of integrating decentralized, weather-dependent renewables by jointly optimising energy system design and control. It proposes a model-free reinforcement learning framework that learns both a control policy and a probabilistic design distribution over continuous, feasible designs using a log-normal mixture, entropy regularisation, and off-policy training with Deep Deterministic Policy Gradient (DDPG). Through experiments on a building-scale PV-battery system, the method demonstrates convergence to high-performing designs and superior long-horizon performance compared to MILP-based and rule-based baselines. Key contributions include enabling co-optimisation without explicit system models, providing design-parameter intervals rather than a single optimum, and improving sample efficiency in energy-system design.

Abstract

The ongoing energy transition drives the development of decentralised renewable energy sources, which are heterogeneous and weather-dependent, complicating their integration into energy systems. This study tackles this issue by introducing a novel reinforcement learning (RL) framework tailored for the co-optimisation of design and control in energy systems. Traditionally, the integration of renewable sources in the energy sector has relied on complex mathematical modelling and sequential processes. By leveraging RL's model-free capabilities, the framework eliminates the need for explicit system modelling. By optimising both control and design policies jointly, the framework enhances the integration of renewable sources and improves system efficiency. This contribution paves the way for advanced RL applications in energy management, leading to more efficient and effective use of renewable energy sources.

Reinforcement Learning for Efficient Design and Control Co-optimisation of Energy Systems

TL;DR

The paper tackles the challenge of integrating decentralized, weather-dependent renewables by jointly optimising energy system design and control. It proposes a model-free reinforcement learning framework that learns both a control policy and a probabilistic design distribution over continuous, feasible designs using a log-normal mixture, entropy regularisation, and off-policy training with Deep Deterministic Policy Gradient (DDPG). Through experiments on a building-scale PV-battery system, the method demonstrates convergence to high-performing designs and superior long-horizon performance compared to MILP-based and rule-based baselines. Key contributions include enabling co-optimisation without explicit system models, providing design-parameter intervals rather than a single optimum, and improving sample efficiency in energy-system design.

Abstract

The ongoing energy transition drives the development of decentralised renewable energy sources, which are heterogeneous and weather-dependent, complicating their integration into energy systems. This study tackles this issue by introducing a novel reinforcement learning (RL) framework tailored for the co-optimisation of design and control in energy systems. Traditionally, the integration of renewable sources in the energy sector has relied on complex mathematical modelling and sequential processes. By leveraging RL's model-free capabilities, the framework eliminates the need for explicit system modelling. By optimising both control and design policies jointly, the framework enhances the integration of renewable sources and improves system efficiency. This contribution paves the way for advanced RL applications in energy management, leading to more efficient and effective use of renewable energy sources.
Paper Structure (17 sections, 11 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 11 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Training performances, over 500 iterations, for the co-optimisation (blue), best two-step (orange), and design-only (green) scenarios. Experiments were conducted using seed values ranging from 0 to 30, with the figure showing the median and quartiles. The top subplot illustrates the evolution of average expected returns on $T$=168, i.e., the effective training. The bottom subplot assesses the average expected return throughout the full training dataset on $T$=8088, i.e., the long-term performance.
  • Figure 2: Validation performances, over 500 iterations, for the co-optimisation (blue), best two-step (orange), design-only (green), and fixed (black) scenarios. Experiments were conducted using seed values ranging from 0 to 30. The figure shows the median and quartiles of the average expected return computed over the entire validation dataset, i.e., $T$=672.
  • Figure 3: Design parameter distribution after training for the co-optimisation (top, blue) and design-only (bottom, green) scenarios. The boxplots are computed based on a sample of 1000 designs drawn from the final design distribution of one of the 30 seed experiments.
  • Figure 4: Visualisation of the historical dataset covering a year of the building electricity consumption and its normalised PV production. The white background indicates the training set, while the grey background represents the validation dataset.