Creating synthetic energy meter data using conditional diffusion and building metadata

Chun Fu; Hussain Kazmi; Matias Quintana; Clayton Miller

Creating synthetic energy meter data using conditional diffusion and building metadata

Chun Fu, Hussain Kazmi, Matias Quintana, Clayton Miller

TL;DR

The study tackles data scarcity and privacy barriers in building energy analytics by developing a meta-driven conditional diffusion model that generates long-term, annual hourly energy data conditioned on building- and meter-type metadata. It compares the diffusion model against CVAE and CGAN using the Building Data Genome 2.0 (BDG2.0) dataset, demonstrating superior fidelity and diversity with metrics such as $FID$ and $KL$; specifically, the diffusion model achieves notably lower $FID$ and $KL$ than the baselines. The approach enables targeted, privacy-preserving synthetic data generation that captures complex temporal patterns across meters and buildings, with open-source code to foster broader adoption and extension. This work paves the way for reliable, context-aware energy data synthesis at building and potentially district scales, facilitating benchmarking, forecasting, and planning without requiring sensitive real-world data.

Abstract

Advances in machine learning and increased computational power have driven progress in energy-related research. However, limited access to private energy data from buildings hinders traditional regression models relying on historical data. While generative models offer a solution, previous studies have primarily focused on short-term generation periods (e.g., daily profiles) and a limited number of meters. Thus, the study proposes a conditional diffusion model for generating high-quality synthetic energy data using relevant metadata. Using a dataset comprising 1,828 power meters from various buildings and countries, this model is compared with traditional methods like Conditional Generative Adversarial Networks (CGAN) and Conditional Variational Auto-Encoders (CVAE). It explicitly handles long-term annual consumption profiles, harnessing metadata such as location, weather, building, and meter type to produce coherent synthetic data that closely resembles real-world energy consumption patterns. The results demonstrate the proposed diffusion model's superior performance, with a 36% reduction in Frechet Inception Distance (FID) score and a 13% decrease in Kullback-Leibler divergence (KL divergence) compared to the following best method. The proposed method successfully generates high-quality energy data through metadata, and its code will be open-sourced, establishing a foundation for a broader array of energy data generation models in the future.

Creating synthetic energy meter data using conditional diffusion and building metadata

TL;DR

and

; specifically, the diffusion model achieves notably lower

and

than the baselines. The approach enables targeted, privacy-preserving synthetic data generation that captures complex temporal patterns across meters and buildings, with open-source code to foster broader adoption and extension. This work paves the way for reliable, context-aware energy data synthesis at building and potentially district scales, facilitating benchmarking, forecasting, and planning without requiring sensitive real-world data.

Abstract

Paper Structure (29 sections, 11 figures, 3 tables)

This paper contains 29 sections, 11 figures, 3 tables.

Introduction
Conventional regression-based energy model and building performance simulation
Emergence of generative models and their application in the energy field
Integration of meta-information into generative models
Research Objectives and Novelty
Methodology
Dataset: Building Data Genome 2.0 (BDG2.0)
Data preprocessing
Data cleaning
Data normalization
Data splitting
Modeling
Conditional Variational Autoencoder (CVAE)
Conditional Generative Adversarial Networks (CGANs)
Conditional diffusion model
...and 14 more sections

Figures (11)

Figure 1: Comparison between traditional methods and our proposed meta-driven generative model.
Figure 2: Counts and categories of building-metadata features in BDG2 dataset miller2020building.
Figure 3: Original 1D time series energy data were reshaped into 2D data to capture weekly usage patterns and then integrated into the image-based generative model.
Figure 4: Illustration of the CVAE model coupled with metadata. The trained decoder takes latent variables and metadata as input, enabling the generation of conditional energy data.
Figure 5: Illustration of the CGAN model combined with metadata. The discriminator aids the generator in producing realistic samples, while the trained generator can generate synthetic data conditioned on the provided metadata.
...and 6 more figures

Creating synthetic energy meter data using conditional diffusion and building metadata

TL;DR

Abstract

Creating synthetic energy meter data using conditional diffusion and building metadata

Authors

TL;DR

Abstract

Table of Contents

Figures (11)