Table of Contents
Fetching ...

Data Enrichment Opportunities for Distribution Grid Cable Networks using Variational Autoencoders

Konrad Sundsgaard, Kutay Bölat, Guangya Yang

TL;DR

The paper tackles data scarcity in reliability modeling for Danish medium-voltage cable networks by leveraging variational autoencoders to enrich data, generate synthetic samples, and impute missing installation ages. It introduces VAEs and CVAEs, along with imputation via pseudo-Gibbs sampling and semi-supervised learning, and demonstrates a proof-of-concept case focusing on installation age imputation. Results show that the VAE can reproduce key data distributions and outperform simple imputation baselines, while competitive to some advanced methods, highlighting both potential and current limitations. The work points toward broader data integration, network-aware modeling, and extension to additional grid components to enable data-driven maintenance decisions in distribution utilities.

Abstract

Electricity distribution cable networks suffer from incomplete and unbalanced data, hindering the effectiveness of machine learning models for predictive maintenance and reliability evaluation. Features such as the installation date of the cables are frequently missing. To address data scarcity, this study investigates the application of Variational Autoencoders (VAEs) for data enrichment, synthetic data generation, imbalanced data handling, and outlier detection. Based on a proof-of-concept case study for Denmark, targeting the imputation of missing age information in cable network asset registers, the analysis underlines the potential of generative models to support data-driven maintenance. However, the study also highlights several areas for improvement, including enhanced feature importance analysis, incorporating network characteristics and external features, and handling biases in missing data. Future initiatives should expand the application of VAEs by incorporating semi-supervised learning, advanced sampling techniques, and additional distribution grid elements, including low-voltage networks, into the analysis.

Data Enrichment Opportunities for Distribution Grid Cable Networks using Variational Autoencoders

TL;DR

The paper tackles data scarcity in reliability modeling for Danish medium-voltage cable networks by leveraging variational autoencoders to enrich data, generate synthetic samples, and impute missing installation ages. It introduces VAEs and CVAEs, along with imputation via pseudo-Gibbs sampling and semi-supervised learning, and demonstrates a proof-of-concept case focusing on installation age imputation. Results show that the VAE can reproduce key data distributions and outperform simple imputation baselines, while competitive to some advanced methods, highlighting both potential and current limitations. The work points toward broader data integration, network-aware modeling, and extension to additional grid components to enable data-driven maintenance decisions in distribution utilities.

Abstract

Electricity distribution cable networks suffer from incomplete and unbalanced data, hindering the effectiveness of machine learning models for predictive maintenance and reliability evaluation. Features such as the installation date of the cables are frequently missing. To address data scarcity, this study investigates the application of Variational Autoencoders (VAEs) for data enrichment, synthetic data generation, imbalanced data handling, and outlier detection. Based on a proof-of-concept case study for Denmark, targeting the imputation of missing age information in cable network asset registers, the analysis underlines the potential of generative models to support data-driven maintenance. However, the study also highlights several areas for improvement, including enhanced feature importance analysis, incorporating network characteristics and external features, and handling biases in missing data. Future initiatives should expand the application of VAEs by incorporating semi-supervised learning, advanced sampling techniques, and additional distribution grid elements, including low-voltage networks, into the analysis.
Paper Structure (19 sections, 3 equations, 8 figures, 3 tables)

This paper contains 19 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Data availability for selected features in failure reports, sorted by availability.
  • Figure 2: Computational diagram of a conditional CVAE. Embedding layers are applied to categorical inputs and conditions. Parra2023
  • Figure 3: Training and Validation Loss of the VAE (asset data) over different training epochs.
  • Figure 4: Empirical Cumulative Distribution Function (ECDF) for the continuous feature: cable section length. Computed during sample level testing
  • Figure 5: Empirical Cumulative Distribution Function (ECDF) for the continuous feature: installation age. Computed during sample level testing
  • ...and 3 more figures