Table of Contents
Fetching ...

Learning and Generating Diverse Residential Load Patterns Using GAN with Weakly-Supervised Training and Weight Selection

Xinyu Liang, Hao Wang

TL;DR

This work tackles the scarcity of high-quality residential load data by introducing RLP-GAN, a weakly-supervised GAN that integrates an over-complete autoencoder and Bi-LSTM components to learn diverse, temporally-rich household load patterns. It employs a three-stage training regime (autoencoder, supervisor, and joint adversarial training) and a Fréchet-distance-based model weight selection to mitigate mode collapse. Evaluations on real-world data from 417 households show that RLP-GAN outperforms four strong baselines (ACGAN, WGAN, C-RNN-GAN, DDPM) in terms of diversity and distribution fidelity, and a public synthetic dataset of one million load-pattern profiles is released. The approach enables scalable generation of realistic residential load data, with practical implications for energy management systems, grid planning, and decarbonization efforts, while highlighting avenues for regional transfer, anomaly generation, and robustness enhancements.

Abstract

The scarcity of high-quality residential load data can pose obstacles for decarbonizing the residential sector as well as effective grid planning and operation. The above challenges have motivated research into generating synthetic load data, but existing methods faced limitations in terms of scalability, diversity, and similarity. This paper proposes a Generative Adversarial Network-based Synthetic Residential Load Pattern (RLP-GAN) generation model, a novel weakly-supervised GAN framework, leveraging an over-complete autoencoder to capture dependencies within complex and diverse load patterns and learn household-level data distribution at scale. We incorporate a model weight selection method to address the mode collapse problem and generate load patterns with high diversity. We develop a holistic evaluation method to validate the effectiveness of RLP-GAN using real-world data of 417 households. The results demonstrate that RLP-GAN outperforms state-of-the-art models in capturing temporal dependencies and generating load patterns with higher similarity to real data. Furthermore, we have publicly released the RLP-GAN generated synthetic dataset, which comprises one million synthetic residential load pattern profiles.

Learning and Generating Diverse Residential Load Patterns Using GAN with Weakly-Supervised Training and Weight Selection

TL;DR

This work tackles the scarcity of high-quality residential load data by introducing RLP-GAN, a weakly-supervised GAN that integrates an over-complete autoencoder and Bi-LSTM components to learn diverse, temporally-rich household load patterns. It employs a three-stage training regime (autoencoder, supervisor, and joint adversarial training) and a Fréchet-distance-based model weight selection to mitigate mode collapse. Evaluations on real-world data from 417 households show that RLP-GAN outperforms four strong baselines (ACGAN, WGAN, C-RNN-GAN, DDPM) in terms of diversity and distribution fidelity, and a public synthetic dataset of one million load-pattern profiles is released. The approach enables scalable generation of realistic residential load data, with practical implications for energy management systems, grid planning, and decarbonization efforts, while highlighting avenues for regional transfer, anomaly generation, and robustness enhancements.

Abstract

The scarcity of high-quality residential load data can pose obstacles for decarbonizing the residential sector as well as effective grid planning and operation. The above challenges have motivated research into generating synthetic load data, but existing methods faced limitations in terms of scalability, diversity, and similarity. This paper proposes a Generative Adversarial Network-based Synthetic Residential Load Pattern (RLP-GAN) generation model, a novel weakly-supervised GAN framework, leveraging an over-complete autoencoder to capture dependencies within complex and diverse load patterns and learn household-level data distribution at scale. We incorporate a model weight selection method to address the mode collapse problem and generate load patterns with high diversity. We develop a holistic evaluation method to validate the effectiveness of RLP-GAN using real-world data of 417 households. The results demonstrate that RLP-GAN outperforms state-of-the-art models in capturing temporal dependencies and generating load patterns with higher similarity to real data. Furthermore, we have publicly released the RLP-GAN generated synthetic dataset, which comprises one million synthetic residential load pattern profiles.

Paper Structure

This paper contains 33 sections, 13 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of the RLP-GAN framework. The generator creates synthetic load patterns from noise vectors, which are refined by the supervisor. The encoder processes real data into hidden representations. The discriminator evaluates these representations, and the decoder reconstructs the load patterns.
  • Figure 2: Fréchet distance and loss value change along with dimension reduced result Visualization during RLP-GAN training. The epoch number is denoted as $n_{epoch}$.
  • Figure 3: Similarity comparison of selected samples from our testing dataset against samples from generated data. The sample pairs are selected by calculating the Euclidean distance and select the minimum one. We also provide the auto-correlations plots of selected sample pairs to verify generated samples' ability to capture time-correlations.
  • Figure 4: Similarity and Diversity comparison using high dimensional load data visualization technique by applying dimension reduction methods to compare synthetic data against original data. Synthetic data are generated by RLP-GAN and four benchmark models displayed in four separate plots. The top five plots are using PCA to perform dimension reduction, and the bottom five plots are based on T-SNE to perform dimension reduction.
  • Figure 5: Similarity comparison of Cumulative Distribution Function (CDF) of aggregate time-series sequences from original and synthetic datasets. Synthetic datasets are generated using RLP-GAN and four benchmark models aiming to capture similar CDF patterns as the original dataset.