Table of Contents
Fetching ...

Exploring Variational Graph Autoencoders for Distribution Grid Data Generation

Syed Zain Abbas, Ehimare Okoyomon

TL;DR

Public distribution grid data are scarce, hindering ML research in power systems; synthetic grid generation offers a privacy-preserving alternative. This work evaluates variational graph autoencoders (VGAEs) with four decoders on two open datasets, ENGAGE and DINGO, using average degree and normalized Laplacian spectrum (via Wasserstein distance) to quantify topology fidelity. Results show that simple decoders underfit realistic topologies, while GCN-based decoders perform well on ENGAGE but struggle with DINGO's diversity, with artifacts such as disconnected components and repeated motifs; the Iterative-GCN decoder provides the best overall fidelity. The authors release their models and analysis as open source to accelerate benchmarking and ML-driven grid research.

Abstract

To address the lack of public power system data for machine learning research in energy networks, we investigate the use of variational graph autoencoders (VGAEs) for synthetic distribution grid generation. Using two open-source datasets, ENGAGE and DINGO, we evaluate four decoder variants and compare generated networks against the original grids using structural and spectral metrics. Results indicate that simple decoders fail to capture realistic topologies, while GCN-based approaches achieve strong fidelity on ENGAGE but struggle on the more complex DINGO dataset, producing artifacts such as disconnected components and repeated motifs. These findings highlight both the promise and limitations of VGAEs for grid synthesis, underscoring the need for more expressive generative models and robust evaluation. We release our models and analysis as open source to support benchmarking and accelerate progress in ML-driven power system research.

Exploring Variational Graph Autoencoders for Distribution Grid Data Generation

TL;DR

Public distribution grid data are scarce, hindering ML research in power systems; synthetic grid generation offers a privacy-preserving alternative. This work evaluates variational graph autoencoders (VGAEs) with four decoders on two open datasets, ENGAGE and DINGO, using average degree and normalized Laplacian spectrum (via Wasserstein distance) to quantify topology fidelity. Results show that simple decoders underfit realistic topologies, while GCN-based decoders perform well on ENGAGE but struggle with DINGO's diversity, with artifacts such as disconnected components and repeated motifs; the Iterative-GCN decoder provides the best overall fidelity. The authors release their models and analysis as open source to accelerate benchmarking and ML-driven grid research.

Abstract

To address the lack of public power system data for machine learning research in energy networks, we investigate the use of variational graph autoencoders (VGAEs) for synthetic distribution grid generation. Using two open-source datasets, ENGAGE and DINGO, we evaluate four decoder variants and compare generated networks against the original grids using structural and spectral metrics. Results indicate that simple decoders fail to capture realistic topologies, while GCN-based approaches achieve strong fidelity on ENGAGE but struggle on the more complex DINGO dataset, producing artifacts such as disconnected components and repeated motifs. These findings highlight both the promise and limitations of VGAEs for grid synthesis, underscoring the need for more expressive generative models and robust evaluation. We release our models and analysis as open source to support benchmarking and accelerate progress in ML-driven power system research.

Paper Structure

This paper contains 22 sections, 7 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Training loss curves for the four decoder architectures on the DINGO dataset.
  • Figure 2: Topological comparison of real and synthetic networks, trained on the ENGAGE dataset.
  • Figure 3: Topological comparison of real and synthetic networks, trained on the DINGO dataset.
  • Figure 4: Dataset characteristics comparison between DINGO and ENGAGE datasets. The DINGO dataset exhibits a wide distribution of network sizes with up to 40,000 nodes per grid, while ENGAGE shows discrete clustering around specific node counts, reflecting its origins as a manually created benchmark dataset.
  • Figure 5: Architecture of the encoder and MLP decoder. The encoder transforms the input graph into a latent representation through GCN layers, normalization, and regularization. The MLP decoder processes concatenated node embeddings through fully connected layers to predict edge probabilities.
  • ...and 2 more figures