Table of Contents
Fetching ...

Creating synthetic energy meter data using conditional diffusion and building metadata

Chun Fu, Hussain Kazmi, Matias Quintana, Clayton Miller

TL;DR

The study tackles data scarcity and privacy barriers in building energy analytics by developing a meta-driven conditional diffusion model that generates long-term, annual hourly energy data conditioned on building- and meter-type metadata. It compares the diffusion model against CVAE and CGAN using the Building Data Genome 2.0 (BDG2.0) dataset, demonstrating superior fidelity and diversity with metrics such as $FID$ and $KL$; specifically, the diffusion model achieves notably lower $FID$ and $KL$ than the baselines. The approach enables targeted, privacy-preserving synthetic data generation that captures complex temporal patterns across meters and buildings, with open-source code to foster broader adoption and extension. This work paves the way for reliable, context-aware energy data synthesis at building and potentially district scales, facilitating benchmarking, forecasting, and planning without requiring sensitive real-world data.

Abstract

Advances in machine learning and increased computational power have driven progress in energy-related research. However, limited access to private energy data from buildings hinders traditional regression models relying on historical data. While generative models offer a solution, previous studies have primarily focused on short-term generation periods (e.g., daily profiles) and a limited number of meters. Thus, the study proposes a conditional diffusion model for generating high-quality synthetic energy data using relevant metadata. Using a dataset comprising 1,828 power meters from various buildings and countries, this model is compared with traditional methods like Conditional Generative Adversarial Networks (CGAN) and Conditional Variational Auto-Encoders (CVAE). It explicitly handles long-term annual consumption profiles, harnessing metadata such as location, weather, building, and meter type to produce coherent synthetic data that closely resembles real-world energy consumption patterns. The results demonstrate the proposed diffusion model's superior performance, with a 36% reduction in Frechet Inception Distance (FID) score and a 13% decrease in Kullback-Leibler divergence (KL divergence) compared to the following best method. The proposed method successfully generates high-quality energy data through metadata, and its code will be open-sourced, establishing a foundation for a broader array of energy data generation models in the future.

Creating synthetic energy meter data using conditional diffusion and building metadata

TL;DR

The study tackles data scarcity and privacy barriers in building energy analytics by developing a meta-driven conditional diffusion model that generates long-term, annual hourly energy data conditioned on building- and meter-type metadata. It compares the diffusion model against CVAE and CGAN using the Building Data Genome 2.0 (BDG2.0) dataset, demonstrating superior fidelity and diversity with metrics such as and ; specifically, the diffusion model achieves notably lower and than the baselines. The approach enables targeted, privacy-preserving synthetic data generation that captures complex temporal patterns across meters and buildings, with open-source code to foster broader adoption and extension. This work paves the way for reliable, context-aware energy data synthesis at building and potentially district scales, facilitating benchmarking, forecasting, and planning without requiring sensitive real-world data.

Abstract

Advances in machine learning and increased computational power have driven progress in energy-related research. However, limited access to private energy data from buildings hinders traditional regression models relying on historical data. While generative models offer a solution, previous studies have primarily focused on short-term generation periods (e.g., daily profiles) and a limited number of meters. Thus, the study proposes a conditional diffusion model for generating high-quality synthetic energy data using relevant metadata. Using a dataset comprising 1,828 power meters from various buildings and countries, this model is compared with traditional methods like Conditional Generative Adversarial Networks (CGAN) and Conditional Variational Auto-Encoders (CVAE). It explicitly handles long-term annual consumption profiles, harnessing metadata such as location, weather, building, and meter type to produce coherent synthetic data that closely resembles real-world energy consumption patterns. The results demonstrate the proposed diffusion model's superior performance, with a 36% reduction in Frechet Inception Distance (FID) score and a 13% decrease in Kullback-Leibler divergence (KL divergence) compared to the following best method. The proposed method successfully generates high-quality energy data through metadata, and its code will be open-sourced, establishing a foundation for a broader array of energy data generation models in the future.
Paper Structure (29 sections, 11 figures, 3 tables)

This paper contains 29 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Comparison between traditional methods and our proposed meta-driven generative model.
  • Figure 2: Counts and categories of building-metadata features in BDG2 dataset miller2020building.
  • Figure 3: Original 1D time series energy data were reshaped into 2D data to capture weekly usage patterns and then integrated into the image-based generative model.
  • Figure 4: Illustration of the CVAE model coupled with metadata. The trained decoder takes latent variables and metadata as input, enabling the generation of conditional energy data.
  • Figure 5: Illustration of the CGAN model combined with metadata. The discriminator aids the generator in producing realistic samples, while the trained generator can generate synthetic data conditioned on the provided metadata.
  • ...and 6 more figures