Table of Contents
Fetching ...

A synthetic dataset of French electric load curves with temperature conditioning

Tahar Nabil, Ghislain Agoua, Pierre Cauchois, Anne De Moliner, Benoît Grossin

TL;DR

The paper tackles privacy barriers in access to granular electricity consumption by introducing a synthetic dataset of French residential load curves conditioned on temperature and static user attributes. It advances a conditional latent diffusion framework that combines a two-stage autoencoder–diffusion model with static and exogenous conditioning to generate realistic 30-minute load profiles. Comprehensive evaluation demonstrates high fidelity, predictive utility, and robust privacy properties, outperforming TimeGAN across multiple metrics and tasks. The work offers a practical, privacy-preserving data resource for energy modeling and forecasting, with implications for broader adoption and future extensions to diverse customer types and exogenous variables.

Abstract

The undergoing energy transition is causing behavioral changes in electricity use, e.g. with self-consumption of local generation, or flexibility services for demand control. To better understand these changes and the challenges they induce, accessing individual smart meter data is crucial. Yet this is personal data under the European GDPR. A widespread use of such data requires thus to create synthetic realistic and privacy-preserving samples. This paper introduces a new synthetic load curve dataset generated by conditional latent diffusion. We also provide the contracted power, time-of-use plan and local temperature used for generation. Fidelity, utility and privacy of the dataset are thoroughly evaluated, demonstrating its good quality and thereby supporting its interest for energy modeling applications.

A synthetic dataset of French electric load curves with temperature conditioning

TL;DR

The paper tackles privacy barriers in access to granular electricity consumption by introducing a synthetic dataset of French residential load curves conditioned on temperature and static user attributes. It advances a conditional latent diffusion framework that combines a two-stage autoencoder–diffusion model with static and exogenous conditioning to generate realistic 30-minute load profiles. Comprehensive evaluation demonstrates high fidelity, predictive utility, and robust privacy properties, outperforming TimeGAN across multiple metrics and tasks. The work offers a practical, privacy-preserving data resource for energy modeling and forecasting, with implications for broader adoption and future extensions to diverse customer types and exogenous variables.

Abstract

The undergoing energy transition is causing behavioral changes in electricity use, e.g. with self-consumption of local generation, or flexibility services for demand control. To better understand these changes and the challenges they induce, accessing individual smart meter data is crucial. Yet this is personal data under the European GDPR. A widespread use of such data requires thus to create synthetic realistic and privacy-preserving samples. This paper introduces a new synthetic load curve dataset generated by conditional latent diffusion. We also provide the contracted power, time-of-use plan and local temperature used for generation. Fidelity, utility and privacy of the dataset are thoroughly evaluated, demonstrating its good quality and thereby supporting its interest for energy modeling applications.

Paper Structure

This paper contains 46 sections, 12 figures, 6 tables.

Figures (12)

  • Figure 1: (a) t-SNE 2D projection of original (blue) and synthetic data across all categories for Latent Diffusion (orange) and TimeGAN (green). (b) to (d) are restricted to the night ToU, 6kVA category: (b) Density estimation of daily statistics and quantiles; (c) Example of a synthetic sample by Latent Diffusion; (d) Mean weekly profiles.
  • Figure 2: Latent diffusion rombach2022high with conditioning on exogenous variables. $\mathcal{E}$: image encoder, $\mathcal{D}$: image decoder, $Q,K,V$: cross-attention network between latent vector $\mathbf{z}$ and exogenous time series $\mathbf{u}$. $\mathbf{x}$ is the input data (load curve) and $\hat{\mathbf{x}}$ its reconstruction. Static labels are added by cross-attention to the diffusion UNet, or by concatenation to the latent codes $\mathbf{z}$.
  • Figure 3: Average one-year load curves of (left) Latent Diffusion and (right) TimeGAN.
  • Figure 4: Average weekly profiles of Latent Diffusion (orange) and TimeGAN (green) against test data (blue).
  • Figure 5: Average autocorrelation functions of Latent Diffusion (orange) and TimeGAN (green) against test data (blue). Shaded areas denote $\pm$ standard deviation across samples.
  • ...and 7 more figures