Combining Physics-based and Data-driven Modeling for Building Energy Systems
Leandro Von Krannichfeldt, Kristina Orehounig, Olga Fink
TL;DR
The paper addresses the challenge of predicting building indoor temperatures by comparing four hybrid physics-based and data-driven approaches (assistant, residual, surrogate, augmentation) across realistic scenarios with varying building documentation and sensor data. It combines a high-fidelity physics model (EnergyPlus) with data-driven learners (LR, FFNN, RF) and evaluates them using MAE, MAPE, and RMSE, augmented by explainability via hierarchical Shapley values. The residual-FFNN emerges as the most effective hybrid across rooms, particularly in highly documented settings, and explainability analyses reveal how physics inputs influence corrections and reveal biases in the physics model at high outdoor temperatures. The study demonstrates that hybrid models can outperform pure physics-based simulations, especially when sensor data is rich, and highlights the value of hierarchical Shapley values for transparent, data-informed model refinement and bias detection.
Abstract
Building energy modeling plays a vital role in optimizing the operation of building energy systems by providing accurate predictions of the building's real-world conditions. In this context, various techniques have been explored, ranging from traditional physics-based models to data-driven models. Recently, researchers are combining physics-based and data-driven models into hybrid approaches. This includes using the physics-based model output as additional data-driven input, learning the residual between physics-based model and real data, learning a surrogate of the physics-based model, or fine-tuning a surrogate model with real data. However, a comprehensive comparison of the inherent advantages of these hybrid approaches is still missing. The primary objective of this work is to evaluate four predominant hybrid approaches in building energy modeling through a real-world case study, with focus on indoor thermodynamics. To achieve this, we devise three scenarios reflecting common levels of building documentation and sensor availability, assess their performance, and analyze their explainability using hierarchical Shapley values. The real-world study reveals three notable findings. First, greater building documentation and sensor availability lead to higher prediction accuracy for hybrid approaches. Second, the performance of hybrid approaches depends on the type of building room, but the residual approach using a Feedforward Neural Network as data-driven sub-model performs best on average across all rooms. This hybrid approach also demonstrates a superior ability to leverage the simulation from the physics-based sub-model. Third, hierarchical Shapley values prove to be an effective tool for explaining and improving hybrid models while accounting for input correlations.
