Table of Contents
Fetching ...

Constrained Reinforcement Learning for Safe Heat Pump Control

Baohe Zhang, Lilli Frison, Thomas Brox, Joschka Bödecker

TL;DR

The paper addresses safe, energy-efficient heat pump control in buildings by formulating the problem as a CMDP and proposing CSAC-LB, a constrained reinforcement learning method that uses a linear smoothed log barrier and double-Q critics to improve constraint satisfaction and data efficiency. A new open-source simulator, I4B, provides standardized interfaces and scenarios to benchmark RL and MPC for heating control. Empirical results across two realistic buildings show CSAC-LB attaining favorable trade-offs between energy consumption and indoor comfort, with robust performance under sensor noise and model uncertainty. The work offers a practical framework bridging RL and building control, with potential for broader deployment and future enhancements like weather forecasting and multi-zone configurations.

Abstract

Constrained Reinforcement Learning (RL) has emerged as a significant research area within RL, where integrating constraints with rewards is crucial for enhancing safety and performance across diverse control tasks. In the context of heating systems in the buildings, optimizing the energy efficiency while maintaining the residents' thermal comfort can be intuitively formulated as a constrained optimization problem. However, to solve it with RL may require large amount of data. Therefore, an accurate and versatile simulator is favored. In this paper, we propose a novel building simulator I4B which provides interfaces for different usages and apply a model-free constrained RL algorithm named constrained Soft Actor-Critic with Linear Smoothed Log Barrier function (CSAC-LB) to the heating optimization problem. Benchmarking against baseline algorithms demonstrates CSAC-LB's efficiency in data exploration, constraint satisfaction and performance.

Constrained Reinforcement Learning for Safe Heat Pump Control

TL;DR

The paper addresses safe, energy-efficient heat pump control in buildings by formulating the problem as a CMDP and proposing CSAC-LB, a constrained reinforcement learning method that uses a linear smoothed log barrier and double-Q critics to improve constraint satisfaction and data efficiency. A new open-source simulator, I4B, provides standardized interfaces and scenarios to benchmark RL and MPC for heating control. Empirical results across two realistic buildings show CSAC-LB attaining favorable trade-offs between energy consumption and indoor comfort, with robust performance under sensor noise and model uncertainty. The work offers a practical framework bridging RL and building control, with potential for broader deployment and future enhancements like weather forecasting and multi-zone configurations.

Abstract

Constrained Reinforcement Learning (RL) has emerged as a significant research area within RL, where integrating constraints with rewards is crucial for enhancing safety and performance across diverse control tasks. In the context of heating systems in the buildings, optimizing the energy efficiency while maintaining the residents' thermal comfort can be intuitively formulated as a constrained optimization problem. However, to solve it with RL may require large amount of data. Therefore, an accurate and versatile simulator is favored. In this paper, we propose a novel building simulator I4B which provides interfaces for different usages and apply a model-free constrained RL algorithm named constrained Soft Actor-Critic with Linear Smoothed Log Barrier function (CSAC-LB) to the heating optimization problem. Benchmarking against baseline algorithms demonstrates CSAC-LB's efficiency in data exploration, constraint satisfaction and performance.
Paper Structure (24 sections, 12 equations, 6 figures, 3 tables)

This paper contains 24 sections, 12 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Left: Log barrier function (Blue curve) with Value Clipping. Due to the value clipping when $y=5$, the gradient vanishes. Right: Linear smoothed Log barrier function with different $\mu$. The dashed line is the indicator function.
  • Figure 2: Overview of software architecture
  • Figure 3: Integration of controller
  • Figure 4: Example thermal building model comprising three states
  • Figure 5: Illustration of the constrained RL algorithms evaluation results during training in Building 1 without noise at seed 1. Each data point represents one evaluation episode and its color represents the number of training steps when the evaluation is performed. The $y$-axis is the maximum temperature deviation and the $x$-axis is the energy usage. Crosses mark evaluations on the Pareto front. Black/Purple star represents MPC and Rule-based Controller results respectively. CSAC-LB is able to explore more on the boundary during the training compared to SAC-LAG and CPO.
  • ...and 1 more figures