Table of Contents
Fetching ...

A Conditional Diffusion Model for Building Energy Modeling Workflows

Saumya Sinha, Alexandre Cortiella, Rawad El Kontar, Andrew Glaws, Ryan King, Patrick Emami

TL;DR

This work tackles the data-gap challenge in building energy modeling by introducing Conditional TabDDPM, a diffusion-based method that performs conditional generation for mixed-type tabular data. Trained on the ResStock dataset of $2.2$ million buildings, the model learns multivariate conditional distributions to impute unknown building attributes conditioned on observed features, enabling complete inputs for UBEM workflows. The approach builds on TabDDPM by incorporating conditional diffusion, dynamic masking, and a single MLP denoiser to handle both numerical and categorical features. Evaluation includes univariate and bivariate distribution comparisons, reconstruction accuracy, and an end-to-end Baltimore case study where URBANopt energy outputs closely match reference profiles, demonstrating practical utility and generalization to unseen locations. Overall, the method offers a scalable, end-to-end solution to accelerate building energy modeling by producing realistic, complete, building-level datasets from partial information.

Abstract

Understanding current energy consumption behavior in communities is critical for informing future energy use decisions and enabling efficient energy management. Urban energy models, which are used to simulate these energy use patterns, require large datasets with detailed building characteristics for accurate outcomes. However, such detailed characteristics at the individual building level are often unknown and costly to acquire, or unavailable. Through this work, we propose using a generative modeling approach to generate realistic building attributes to fill in the data gaps and finally provide complete characteristics as inputs to energy models. Our model learns complex, building-level patterns from training on a large-scale residential building stock model containing 2.2 million buildings. We employ a tabular diffusion-based framework that is designed to handle heterogeneous (discrete and continuous) features in tabular building data, such as occupancy, floor area, heating, cooling, and other equipment details. We develop a capability for conditional diffusion, enabling the imputation of missing building characteristics conditioned on known attributes. We conduct a comprehensive validation of our conditional diffusion model, firstly by comparing the generated conditional distributions against the underlying data distribution, and secondly, by performing a case study for a Baltimore residential region, showing the practical utility of our approach. Our work is one of the first to demonstrate the potential of generative modeling to accelerate building energy modeling workflows.

A Conditional Diffusion Model for Building Energy Modeling Workflows

TL;DR

This work tackles the data-gap challenge in building energy modeling by introducing Conditional TabDDPM, a diffusion-based method that performs conditional generation for mixed-type tabular data. Trained on the ResStock dataset of million buildings, the model learns multivariate conditional distributions to impute unknown building attributes conditioned on observed features, enabling complete inputs for UBEM workflows. The approach builds on TabDDPM by incorporating conditional diffusion, dynamic masking, and a single MLP denoiser to handle both numerical and categorical features. Evaluation includes univariate and bivariate distribution comparisons, reconstruction accuracy, and an end-to-end Baltimore case study where URBANopt energy outputs closely match reference profiles, demonstrating practical utility and generalization to unseen locations. Overall, the method offers a scalable, end-to-end solution to accelerate building energy modeling by producing realistic, complete, building-level datasets from partial information.

Abstract

Understanding current energy consumption behavior in communities is critical for informing future energy use decisions and enabling efficient energy management. Urban energy models, which are used to simulate these energy use patterns, require large datasets with detailed building characteristics for accurate outcomes. However, such detailed characteristics at the individual building level are often unknown and costly to acquire, or unavailable. Through this work, we propose using a generative modeling approach to generate realistic building attributes to fill in the data gaps and finally provide complete characteristics as inputs to energy models. Our model learns complex, building-level patterns from training on a large-scale residential building stock model containing 2.2 million buildings. We employ a tabular diffusion-based framework that is designed to handle heterogeneous (discrete and continuous) features in tabular building data, such as occupancy, floor area, heating, cooling, and other equipment details. We develop a capability for conditional diffusion, enabling the imputation of missing building characteristics conditioned on known attributes. We conduct a comprehensive validation of our conditional diffusion model, firstly by comparing the generated conditional distributions against the underlying data distribution, and secondly, by performing a case study for a Baltimore residential region, showing the practical utility of our approach. Our work is one of the first to demonstrate the potential of generative modeling to accelerate building energy modeling workflows.

Paper Structure

This paper contains 16 sections, 5 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Training overview of our conditional generative diffusion model for a mixed-type tabular dataset. The Conditional TabDDPM model learns to generate the missing (unknown) building characteristics conditioned on observed building attributes. We show details of the denoiser training pipeline for a single timestep in this figure, where the known (condition) features are used to guide the denoising of the unknown (target) features. A training batch of building characteristics consists of numerical ($x_{num}$) and categorical features ($x_{cat}$), which are encoded with their respective transforms. A random subset of features is then masked, and these masks ($mask_{num}$ and $mask_{cat}$) are used to create the target and condition components. The (noisy) target and (observed) condition components are then concatenated and passed as inputs to the MLP denoiser model, along with the current diffusion timestep (embedded as in kotelnikov2023tabddpm). The MLP is trained to predict the noise for numerical and logits for categorical target variables, corresponding to this timestep.
  • Figure 2: Comparison of the true vs generated univariate conditional distribution for a set of numerical and categorical features. We plot the distributions corresponding to the specific dependency combination (included in the sub-captions) that resulted in the minimum distance between the distributions. The results shown are obtained with the mixed imputation model.
  • Figure 3: Comparison of the true vs generated univariate conditional distribution for a set of categorical features, evaluated on the OOD test dataset. We plot the distributions corresponding to the specific dependency combination (included in the sub-captions) that resulted in the minimum distance between the distributions. The results shown are obtained with the mixed imputation model.
  • Figure 4: Comparison of the true vs generated bivariate conditional distributions. The central panel shows a heatmap of the error in the joint bivariate distribution, and the top and side panels compare the generated and true marginal distributions. We plot the distributions corresponding to the specific dependency combination that resulted in the minimum distance between the joint distribution. The results shown are obtained with the categorical-only imputation model trained using a masking ratio of $0.05$.
  • Figure 5: A residential community in Baltimore is used in our case study (based on el5264131ai). There are $77$ single-family attached buildings. Building characteristics such as built year, number of stories, heating fuel, attic type, and building area (square footage) are obtained for these buildings from sources like OpenStreetMap and Zillow/Redfin.
  • ...and 3 more figures