Opening the Black Box: Towards inherently interpretable energy data imputation models using building physics insight
Antonio Liguori, Matias Quintana, Chun Fu, Clayton Miller, Jérôme Frisch, Christoph van Treeck
TL;DR
This work tackles missing data in building energy datasets by introducing PI-DAE, a physics-informed denoising autoencoder that embeds a simple RC-based building thermal-balance equation as a soft constraint. The method combines a multivariate DAE with an approximated ODE loss to yield imputations that are not only accurate but physically consistent, enabling inherently interpretable coefficients that reflect correlations among indoor temperature, cooling, and heating. Across an ablation study and a Berkeley office dataset, PI-DAE demonstrates improved robustness to varying missing data patterns, with the physics coefficients offering actionable insights into unmodeled disturbances and the strength of inter-variable relationships. While RMSE gains are data-dependent and sometimes modest, the approach advances practical imputation for retrofit analysis and supports broader application to time-series data in building systems, all with a computational footprint comparable to standard deep-learning baselines.
Abstract
Missing data are frequently observed by practitioners and researchers in the building energy modeling community. In this regard, advanced data-driven solutions, such as Deep Learning methods, are typically required to reflect the non-linear behavior of these anomalies. As an ongoing research question related to Deep Learning, a model's applicability to limited data settings can be explored by introducing prior knowledge in the network. This same strategy can also lead to more interpretable predictions, hence facilitating the field application of the approach. For that purpose, the aim of this paper is to propose the use of Physics-informed Denoising Autoencoders (PI-DAE) for missing data imputation in commercial buildings. In particular, the presented method enforces physics-inspired soft constraints to the loss function of a Denoising Autoencoder (DAE). In order to quantify the benefits of the physical component, an ablation study between different DAE configurations is conducted. First, three univariate DAEs are optimized separately on indoor air temperature, heating, and cooling data. Then, two multivariate DAEs are derived from the previous configurations. Eventually, a building thermal balance equation is coupled to the last multivariate configuration to obtain PI-DAE. Additionally, two commonly used benchmarks are employed to support the findings. It is shown how introducing physical knowledge in a multivariate Denoising Autoencoder can enhance the inherent model interpretability through the optimized physics-based coefficients. While no significant improvement is observed in terms of reconstruction error with the proposed PI-DAE, its enhanced robustness to varying rates of missing data and the valuable insights derived from the physics-based coefficients create opportunities for wider applications within building systems and the built environment.
