Table of Contents
Fetching ...

Combining Physics-based and Data-driven Modeling for Building Energy Systems

Leandro Von Krannichfeldt, Kristina Orehounig, Olga Fink

TL;DR

The paper addresses the challenge of predicting building indoor temperatures by comparing four hybrid physics-based and data-driven approaches (assistant, residual, surrogate, augmentation) across realistic scenarios with varying building documentation and sensor data. It combines a high-fidelity physics model (EnergyPlus) with data-driven learners (LR, FFNN, RF) and evaluates them using MAE, MAPE, and RMSE, augmented by explainability via hierarchical Shapley values. The residual-FFNN emerges as the most effective hybrid across rooms, particularly in highly documented settings, and explainability analyses reveal how physics inputs influence corrections and reveal biases in the physics model at high outdoor temperatures. The study demonstrates that hybrid models can outperform pure physics-based simulations, especially when sensor data is rich, and highlights the value of hierarchical Shapley values for transparent, data-informed model refinement and bias detection.

Abstract

Building energy modeling plays a vital role in optimizing the operation of building energy systems by providing accurate predictions of the building's real-world conditions. In this context, various techniques have been explored, ranging from traditional physics-based models to data-driven models. Recently, researchers are combining physics-based and data-driven models into hybrid approaches. This includes using the physics-based model output as additional data-driven input, learning the residual between physics-based model and real data, learning a surrogate of the physics-based model, or fine-tuning a surrogate model with real data. However, a comprehensive comparison of the inherent advantages of these hybrid approaches is still missing. The primary objective of this work is to evaluate four predominant hybrid approaches in building energy modeling through a real-world case study, with focus on indoor thermodynamics. To achieve this, we devise three scenarios reflecting common levels of building documentation and sensor availability, assess their performance, and analyze their explainability using hierarchical Shapley values. The real-world study reveals three notable findings. First, greater building documentation and sensor availability lead to higher prediction accuracy for hybrid approaches. Second, the performance of hybrid approaches depends on the type of building room, but the residual approach using a Feedforward Neural Network as data-driven sub-model performs best on average across all rooms. This hybrid approach also demonstrates a superior ability to leverage the simulation from the physics-based sub-model. Third, hierarchical Shapley values prove to be an effective tool for explaining and improving hybrid models while accounting for input correlations.

Combining Physics-based and Data-driven Modeling for Building Energy Systems

TL;DR

The paper addresses the challenge of predicting building indoor temperatures by comparing four hybrid physics-based and data-driven approaches (assistant, residual, surrogate, augmentation) across realistic scenarios with varying building documentation and sensor data. It combines a high-fidelity physics model (EnergyPlus) with data-driven learners (LR, FFNN, RF) and evaluates them using MAE, MAPE, and RMSE, augmented by explainability via hierarchical Shapley values. The residual-FFNN emerges as the most effective hybrid across rooms, particularly in highly documented settings, and explainability analyses reveal how physics inputs influence corrections and reveal biases in the physics model at high outdoor temperatures. The study demonstrates that hybrid models can outperform pure physics-based simulations, especially when sensor data is rich, and highlights the value of hierarchical Shapley values for transparent, data-informed model refinement and bias detection.

Abstract

Building energy modeling plays a vital role in optimizing the operation of building energy systems by providing accurate predictions of the building's real-world conditions. In this context, various techniques have been explored, ranging from traditional physics-based models to data-driven models. Recently, researchers are combining physics-based and data-driven models into hybrid approaches. This includes using the physics-based model output as additional data-driven input, learning the residual between physics-based model and real data, learning a surrogate of the physics-based model, or fine-tuning a surrogate model with real data. However, a comprehensive comparison of the inherent advantages of these hybrid approaches is still missing. The primary objective of this work is to evaluate four predominant hybrid approaches in building energy modeling through a real-world case study, with focus on indoor thermodynamics. To achieve this, we devise three scenarios reflecting common levels of building documentation and sensor availability, assess their performance, and analyze their explainability using hierarchical Shapley values. The real-world study reveals three notable findings. First, greater building documentation and sensor availability lead to higher prediction accuracy for hybrid approaches. Second, the performance of hybrid approaches depends on the type of building room, but the residual approach using a Feedforward Neural Network as data-driven sub-model performs best on average across all rooms. This hybrid approach also demonstrates a superior ability to leverage the simulation from the physics-based sub-model. Third, hierarchical Shapley values prove to be an effective tool for explaining and improving hybrid models while accounting for input correlations.

Paper Structure

This paper contains 20 sections, 13 equations, 25 figures, 5 tables.

Figures (25)

  • Figure 1: Overview of the different hybrid approaches with physics-based EnergyPlus and data-driven combination. The data sources are indicated as building documentation and sensors with various sensor groups. The corresponding data flows are indicated by colored arrows, which meanings are given in the legend at the figure bottom. The green exogenous data arrow indicates all the sensor measurements apart from the indoor room temperature. $X$ and $Y$ denote the input features and target variables of the data-driven model for learning purposes. $\hat{Y}$ represents the hybrid model's indoor temperature prediction. Note that the augmentation approach involves a two-step learning procedure. While step 1 indicates the learning on simulated data, step 2 denotes the fine-tuning on real indoor temperature.
  • Figure 2: Overview of the EnergyPlus workflow with modules, adapted from energyplus.
  • Figure 3: UMAR unit at Empa with two bedrooms, one living room and two bathrooms.
  • Figure 4: MAPE boxplot for the three scenarios W, WB and WBR between real indoor temperature and model prediction. The green triangle and line denote mean resp. median of the error distribution.
  • Figure 5: MAPE for all rooms, hybrid approaches as well as pure physics-based and data-driven models in the WBR-scenario. The grey-dotted line indicates the mean MAPE across all rooms.
  • ...and 20 more figures