A Receding Horizon Reinforcement Learning Framework for Campus Chiller Energy Management - A case study from an Australian University
Laura Musgrave, Arnab Bhattacharjee, Tapan Kumar Saha
TL;DR
This study tackles campus chiller energy management by formulating a receding-horizon reinforcement learning problem to optimally schedule multiple, heterogeneous chillers. It integrates a PPO agent with 24-hour ahead planning and a transformer-based TimeXer forecaster to predict building cooling demand, while using a prioritized reward to enforce hard physical constraints. A physics-informed chiller power model and PLR relationships drive energy minimization over the horizon. Experimental results on a nine-building Australian campus show up to 28% electricity savings over a rule-based baseline, with improved COP and constraint satisfaction, demonstrating the practical potential of data-driven, horizon-aware HVAC control while highlighting limitations related to pipe losses, reward automation, and online retraining.
Abstract
This work presents a case study of optimal energy management of a large Heating Ventilation and Cooling (HVAC) system within a university campus in Australia using Reinforcement Learning (RL). The HVAC system supplies to nine university buildings with an annual average electricity consumption of $\sim2$ GWh. Updated chiller Coefficient of Performance (COP) curves are identified, and a predictive building cooling demand model is developed using historical data from the HVAC system. Based on these inputs, a Proximal Policy Optimization based RL model is trained to optimally schedule the chillers in a receding horizon control framework with a priority reward function for constraint satisfaction. Compared to the traditional way of controlling the HVAC system based on a reactive rule-based method, the proposed controller saves up to 28\% of the electricity consumed by simply controlling the mass flow rates of the chiller banks and with minimal constraint violations.
