Comparative Field Deployment of Reinforcement Learning and Model Predictive Control for Residential HVAC
Ozan Baris Mulayim, Elias N. Pergantis, Levi D. Reyes Premer, Bingqing Chen, Guannan Qu, Kevin J. Kircher, Mario Bergés
TL;DR
The paper reports a head-to-head field comparison of model-based RL (Ibex-RL) and MPC for residential HVAC in an occupied home. It finds RL can achieve energy savings comparable to MPC (approximately 22% vs 20% relative to a baseline), but with higher variability and modestly worse comfort-normalized efficiency due to modeling and online adaptation challenges. MPC, while more labor-intensive to set up, delivers superior comfort adherence and more robust, constraint-compliant performance once calibrated. The work highlights the practical trade-offs between deployment labor and control performance, emphasizing the need for robust online-learning validation, better state–actuator alignment, and hybrid strategies that leverage RL automation within the constraint-aware MPC framework for scalable residential HVAC control.
Abstract
Advanced control strategies like Model Predictive Control (MPC) offer significant energy savings for HVAC systems but often require substantial engineering effort, limiting scalability. Reinforcement Learning (RL) promises greater automation and adaptability, yet its practical application in real-world residential settings remains largely undemonstrated, facing challenges related to safety, interpretability, and sample efficiency. To investigate these practical issues, we performed a direct comparison of an MPC and a model-based RL controller, with each controller deployed for a one-month period in an occupied house with a heat pump system in West Lafayette, Indiana. This investigation aimed to explore scalability of the chosen RL and MPC implementations while ensuring safety and comparability. The advanced controllers were evaluated against each other and against the existing controller. RL achieved substantial energy savings (22\% relative to the existing controller), slightly exceeding MPC's savings (20\%), albeit with modestly higher occupant discomfort. However, when energy savings were normalized for the level of comfort provided, MPC demonstrated superior performance. This study's empirical results show that while RL reduces engineering overhead, it introduces practical trade-offs in model accuracy and operational robustness. The key lessons learned concern the difficulties of safe controller initialization, navigating the mismatch between control actions and their practical implementation, and maintaining the integrity of online learning in a live environment. These insights pinpoint the essential research directions needed to advance RL from a promising concept to a truly scalable HVAC control solution.
