Opening the Black Box: Towards inherently interpretable energy data imputation models using building physics insight

Antonio Liguori; Matias Quintana; Chun Fu; Clayton Miller; Jérôme Frisch; Christoph van Treeck

Opening the Black Box: Towards inherently interpretable energy data imputation models using building physics insight

Antonio Liguori, Matias Quintana, Chun Fu, Clayton Miller, Jérôme Frisch, Christoph van Treeck

TL;DR

This work tackles missing data in building energy datasets by introducing PI-DAE, a physics-informed denoising autoencoder that embeds a simple RC-based building thermal-balance equation as a soft constraint. The method combines a multivariate DAE with an approximated ODE loss to yield imputations that are not only accurate but physically consistent, enabling inherently interpretable coefficients that reflect correlations among indoor temperature, cooling, and heating. Across an ablation study and a Berkeley office dataset, PI-DAE demonstrates improved robustness to varying missing data patterns, with the physics coefficients offering actionable insights into unmodeled disturbances and the strength of inter-variable relationships. While RMSE gains are data-dependent and sometimes modest, the approach advances practical imputation for retrofit analysis and supports broader application to time-series data in building systems, all with a computational footprint comparable to standard deep-learning baselines.

Abstract

Missing data are frequently observed by practitioners and researchers in the building energy modeling community. In this regard, advanced data-driven solutions, such as Deep Learning methods, are typically required to reflect the non-linear behavior of these anomalies. As an ongoing research question related to Deep Learning, a model's applicability to limited data settings can be explored by introducing prior knowledge in the network. This same strategy can also lead to more interpretable predictions, hence facilitating the field application of the approach. For that purpose, the aim of this paper is to propose the use of Physics-informed Denoising Autoencoders (PI-DAE) for missing data imputation in commercial buildings. In particular, the presented method enforces physics-inspired soft constraints to the loss function of a Denoising Autoencoder (DAE). In order to quantify the benefits of the physical component, an ablation study between different DAE configurations is conducted. First, three univariate DAEs are optimized separately on indoor air temperature, heating, and cooling data. Then, two multivariate DAEs are derived from the previous configurations. Eventually, a building thermal balance equation is coupled to the last multivariate configuration to obtain PI-DAE. Additionally, two commonly used benchmarks are employed to support the findings. It is shown how introducing physical knowledge in a multivariate Denoising Autoencoder can enhance the inherent model interpretability through the optimized physics-based coefficients. While no significant improvement is observed in terms of reconstruction error with the proposed PI-DAE, its enhanced robustness to varying rates of missing data and the valuable insights derived from the physics-based coefficients create opportunities for wider applications within building systems and the built environment.

Opening the Black Box: Towards inherently interpretable energy data imputation models using building physics insight

TL;DR

Abstract

Paper Structure (21 sections, 6 equations, 8 figures, 4 tables)

This paper contains 21 sections, 6 equations, 8 figures, 4 tables.

Introduction
Building data imputation
Combining physics-informed learning with building data imputation
Bridging the gap between academic research and industry through inherent physical interpretability
Contribution
Literature review
Methodology
Dataset description
Building thermal balance
From DAE to PI-DAE
Model optimization
Building monitoring periods
Experimental design
Experiments
Discussion
...and 6 more sections

Figures (8)

Figure 1: Working principle of a generic univariate DAE for missing indoor air temperature data imputation with 20% corruption rate (CR).
Figure 2: Representation of the proposed PI-DAE with approximated building energy balance ODE. The analyzed variables are the total cooling flow rate ($\dot{Q}_{cool_{tot}}$), the reheat water flow rate ($\dot{Q}_{hw}$), the average indoor air temperature ($T_{ra_{avg}}$) and the average outdoor air temperature ($T_{oa_{avg}}$). Figure partially reproduced based on Jagtap et al. jagtap2020adaptive.
Figure 3: Learning curves of the models at varied training rates and cases. For Case 1, the depicted training rates (from 0.1 to 0.5) correspond to training set sizes of 36, 72, 108, 145, and 181 days, respectively. Alternatively, for Case 2, the depicted training rates (from 0.1 to 0.5) correspond to training set sizes of 1, 3, 5, 7, and 9 days, respectively. The analyzed variables are the total cooling flow rate ($\dot{Q}_{cool_{tot}}$), the reheat water flow rate ($\dot{Q}_{hw}$) and the average indoor air temperature ($T_{ra_{avg}}$).
Figure 4: Learning curves of the models with standard deviation for Case 1 at varied training rates. For Case 1, the depicted training rates (from 0.1 to 0.5) correspond to training set sizes of 36, 72, 108, 145, and 181 days, respectively. Alternatively, for Case 2, the depicted training rates (from 0.1 to 0.5) correspond to training set sizes of 1, 3, 5, 7, and 9 days, respectively. The analyzed variables are the total cooling flow rate ($\dot{Q}_{cool_{tot}}$), the reheat water flow rate ($\dot{Q}_{hw}$) and the average indoor air temperature ($T_{ra_{avg}}$).
Figure 5: Learning curves of the models with standard deviation for Case 2 at varied training rates and Case 2. For Case 1, the depicted training rates (from 0.1 to 0.5) correspond to training set sizes of 36, 72, 108, 145, and 181 days, respectively. Alternatively, for Case 2, the depicted training rates (from 0.1 to 0.5) correspond to training set sizes of 1, 3, 5, 7, and 9 days, respectively. The analyzed variables are the total cooling flow rate ($\dot{Q}_{cool_{tot}}$), the reheat water flow rate ($\dot{Q}_{hw}$) and the average indoor air temperature ($T_{ra_{avg}}$).
...and 3 more figures

Opening the Black Box: Towards inherently interpretable energy data imputation models using building physics insight

TL;DR

Abstract

Opening the Black Box: Towards inherently interpretable energy data imputation models using building physics insight

Authors

TL;DR

Abstract

Table of Contents

Figures (8)