A synthetic dataset of French electric load curves with temperature conditioning
Tahar Nabil, Ghislain Agoua, Pierre Cauchois, Anne De Moliner, Benoît Grossin
TL;DR
The paper tackles privacy barriers in access to granular electricity consumption by introducing a synthetic dataset of French residential load curves conditioned on temperature and static user attributes. It advances a conditional latent diffusion framework that combines a two-stage autoencoder–diffusion model with static and exogenous conditioning to generate realistic 30-minute load profiles. Comprehensive evaluation demonstrates high fidelity, predictive utility, and robust privacy properties, outperforming TimeGAN across multiple metrics and tasks. The work offers a practical, privacy-preserving data resource for energy modeling and forecasting, with implications for broader adoption and future extensions to diverse customer types and exogenous variables.
Abstract
The undergoing energy transition is causing behavioral changes in electricity use, e.g. with self-consumption of local generation, or flexibility services for demand control. To better understand these changes and the challenges they induce, accessing individual smart meter data is crucial. Yet this is personal data under the European GDPR. A widespread use of such data requires thus to create synthetic realistic and privacy-preserving samples. This paper introduces a new synthetic load curve dataset generated by conditional latent diffusion. We also provide the contracted power, time-of-use plan and local temperature used for generation. Fidelity, utility and privacy of the dataset are thoroughly evaluated, demonstrating its good quality and thereby supporting its interest for energy modeling applications.
