A Conditional Diffusion Model for Building Energy Modeling Workflows
Saumya Sinha, Alexandre Cortiella, Rawad El Kontar, Andrew Glaws, Ryan King, Patrick Emami
TL;DR
This work tackles the data-gap challenge in building energy modeling by introducing Conditional TabDDPM, a diffusion-based method that performs conditional generation for mixed-type tabular data. Trained on the ResStock dataset of $2.2$ million buildings, the model learns multivariate conditional distributions to impute unknown building attributes conditioned on observed features, enabling complete inputs for UBEM workflows. The approach builds on TabDDPM by incorporating conditional diffusion, dynamic masking, and a single MLP denoiser to handle both numerical and categorical features. Evaluation includes univariate and bivariate distribution comparisons, reconstruction accuracy, and an end-to-end Baltimore case study where URBANopt energy outputs closely match reference profiles, demonstrating practical utility and generalization to unseen locations. Overall, the method offers a scalable, end-to-end solution to accelerate building energy modeling by producing realistic, complete, building-level datasets from partial information.
Abstract
Understanding current energy consumption behavior in communities is critical for informing future energy use decisions and enabling efficient energy management. Urban energy models, which are used to simulate these energy use patterns, require large datasets with detailed building characteristics for accurate outcomes. However, such detailed characteristics at the individual building level are often unknown and costly to acquire, or unavailable. Through this work, we propose using a generative modeling approach to generate realistic building attributes to fill in the data gaps and finally provide complete characteristics as inputs to energy models. Our model learns complex, building-level patterns from training on a large-scale residential building stock model containing 2.2 million buildings. We employ a tabular diffusion-based framework that is designed to handle heterogeneous (discrete and continuous) features in tabular building data, such as occupancy, floor area, heating, cooling, and other equipment details. We develop a capability for conditional diffusion, enabling the imputation of missing building characteristics conditioned on known attributes. We conduct a comprehensive validation of our conditional diffusion model, firstly by comparing the generated conditional distributions against the underlying data distribution, and secondly, by performing a case study for a Baltimore residential region, showing the practical utility of our approach. Our work is one of the first to demonstrate the potential of generative modeling to accelerate building energy modeling workflows.
