Table of Contents
Fetching ...

Comparative Field Deployment of Reinforcement Learning and Model Predictive Control for Residential HVAC

Ozan Baris Mulayim, Elias N. Pergantis, Levi D. Reyes Premer, Bingqing Chen, Guannan Qu, Kevin J. Kircher, Mario Bergés

TL;DR

The paper reports a head-to-head field comparison of model-based RL (Ibex-RL) and MPC for residential HVAC in an occupied home. It finds RL can achieve energy savings comparable to MPC (approximately 22% vs 20% relative to a baseline), but with higher variability and modestly worse comfort-normalized efficiency due to modeling and online adaptation challenges. MPC, while more labor-intensive to set up, delivers superior comfort adherence and more robust, constraint-compliant performance once calibrated. The work highlights the practical trade-offs between deployment labor and control performance, emphasizing the need for robust online-learning validation, better state–actuator alignment, and hybrid strategies that leverage RL automation within the constraint-aware MPC framework for scalable residential HVAC control.

Abstract

Advanced control strategies like Model Predictive Control (MPC) offer significant energy savings for HVAC systems but often require substantial engineering effort, limiting scalability. Reinforcement Learning (RL) promises greater automation and adaptability, yet its practical application in real-world residential settings remains largely undemonstrated, facing challenges related to safety, interpretability, and sample efficiency. To investigate these practical issues, we performed a direct comparison of an MPC and a model-based RL controller, with each controller deployed for a one-month period in an occupied house with a heat pump system in West Lafayette, Indiana. This investigation aimed to explore scalability of the chosen RL and MPC implementations while ensuring safety and comparability. The advanced controllers were evaluated against each other and against the existing controller. RL achieved substantial energy savings (22\% relative to the existing controller), slightly exceeding MPC's savings (20\%), albeit with modestly higher occupant discomfort. However, when energy savings were normalized for the level of comfort provided, MPC demonstrated superior performance. This study's empirical results show that while RL reduces engineering overhead, it introduces practical trade-offs in model accuracy and operational robustness. The key lessons learned concern the difficulties of safe controller initialization, navigating the mismatch between control actions and their practical implementation, and maintaining the integrity of online learning in a live environment. These insights pinpoint the essential research directions needed to advance RL from a promising concept to a truly scalable HVAC control solution.

Comparative Field Deployment of Reinforcement Learning and Model Predictive Control for Residential HVAC

TL;DR

The paper reports a head-to-head field comparison of model-based RL (Ibex-RL) and MPC for residential HVAC in an occupied home. It finds RL can achieve energy savings comparable to MPC (approximately 22% vs 20% relative to a baseline), but with higher variability and modestly worse comfort-normalized efficiency due to modeling and online adaptation challenges. MPC, while more labor-intensive to set up, delivers superior comfort adherence and more robust, constraint-compliant performance once calibrated. The work highlights the practical trade-offs between deployment labor and control performance, emphasizing the need for robust online-learning validation, better state–actuator alignment, and hybrid strategies that leverage RL automation within the constraint-aware MPC framework for scalable residential HVAC control.

Abstract

Advanced control strategies like Model Predictive Control (MPC) offer significant energy savings for HVAC systems but often require substantial engineering effort, limiting scalability. Reinforcement Learning (RL) promises greater automation and adaptability, yet its practical application in real-world residential settings remains largely undemonstrated, facing challenges related to safety, interpretability, and sample efficiency. To investigate these practical issues, we performed a direct comparison of an MPC and a model-based RL controller, with each controller deployed for a one-month period in an occupied house with a heat pump system in West Lafayette, Indiana. This investigation aimed to explore scalability of the chosen RL and MPC implementations while ensuring safety and comparability. The advanced controllers were evaluated against each other and against the existing controller. RL achieved substantial energy savings (22\% relative to the existing controller), slightly exceeding MPC's savings (20\%), albeit with modestly higher occupant discomfort. However, when energy savings were normalized for the level of comfort provided, MPC demonstrated superior performance. This study's empirical results show that while RL reduces engineering overhead, it introduces practical trade-offs in model accuracy and operational robustness. The key lessons learned concern the difficulties of safe controller initialization, navigating the mismatch between control actions and their practical implementation, and maintaining the integrity of online learning in a live environment. These insights pinpoint the essential research directions needed to advance RL from a promising concept to a truly scalable HVAC control solution.

Paper Structure

This paper contains 35 sections, 15 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: Overview of the RL and MPC controllers
  • Figure 2: Thermal circuit model of the testbed.
  • Figure 3: Testbed House is a 208 m$^\text{2}$, 1920s-era house with all-electric appliances in West Lafayette, Indiana, USA.
  • Figure 4: State, Action and Imitation Losses coming from the training of the imitation learning agent for the combination $\alpha_{\text{imit}}=0.05$ and $\lambda=1000$, which resulted in the lowest action loss.
  • Figure 5: Cold weather comparison: RL demonstrates an energy-saving strategy (79.4 kWh) through setpoint modulation (1°C below user preference) and uses backup heat less, contrasting with MPC and PID's higher consumption.
  • ...and 6 more figures