Impact of data for forecasting on performance of model predictive control in buildings with smart energy storage
Max Langtry, Vijja Wichitwechkarn, Rebecca Ward, Chaoqun Zhuang, Monika J. Kreitmair, Nikolas Makasis, Zack Xuereb Conti, Ruchi Choudhary
TL;DR
This work addresses how data quantity and quality affect forecast accuracy and the resulting MPC performance in buildings with distributed storage. It compares simple linear neural models against state-of-the-art predictors within a CityLearn-based multi-building simulation, examining data-efficiency measures including model reuse, training duration, feature selection, and online updating. Key findings show that a simple Linear model matches complex models in forecast accuracy with far greater data efficiency and generalization, that more than ~2 years of training data offers limited gains, and that change-point screening and online retraining can substantially improve performance. The results provide practical guidance for cost-effective data collection and model maintenance in MPC-enabled building energy systems, with implications for storage-enabled demand management and grid impact reduction.
Abstract
Data is required to develop forecasting models for use in Model Predictive Control (MPC) schemes in building energy systems. However, data is costly to both collect and exploit. Determining cost optimal data usage strategies requires understanding of the forecast accuracy and resulting MPC operational performance it enables. This study investigates the performance of both simple and state-of-the-art machine learning prediction models for MPC in multi-building energy systems using a simulated case study with historic building energy data. The impact on forecast accuracy of measures to improve model data efficiency are quantified, specifically for: reuse of prediction models, reduction of training data duration, reduction of model data features, and online model training. A simple linear multi-layer perceptron model is shown to provide equivalent forecast accuracy to state-of-the-art models, with greater data efficiency and generalisability. The use of more than 2 years of training data for load prediction models provided no significant improvement in forecast accuracy. Forecast accuracy and data efficiency were improved simultaneously by using change-point analysis to screen training data. Reused models and those trained with 3 months of data had on average 10% higher error than baseline, indicating that deploying MPC systems without prior data collection may be economic.
