Data Enrichment Opportunities for Distribution Grid Cable Networks using Variational Autoencoders
Konrad Sundsgaard, Kutay Bölat, Guangya Yang
TL;DR
The paper tackles data scarcity in reliability modeling for Danish medium-voltage cable networks by leveraging variational autoencoders to enrich data, generate synthetic samples, and impute missing installation ages. It introduces VAEs and CVAEs, along with imputation via pseudo-Gibbs sampling and semi-supervised learning, and demonstrates a proof-of-concept case focusing on installation age imputation. Results show that the VAE can reproduce key data distributions and outperform simple imputation baselines, while competitive to some advanced methods, highlighting both potential and current limitations. The work points toward broader data integration, network-aware modeling, and extension to additional grid components to enable data-driven maintenance decisions in distribution utilities.
Abstract
Electricity distribution cable networks suffer from incomplete and unbalanced data, hindering the effectiveness of machine learning models for predictive maintenance and reliability evaluation. Features such as the installation date of the cables are frequently missing. To address data scarcity, this study investigates the application of Variational Autoencoders (VAEs) for data enrichment, synthetic data generation, imbalanced data handling, and outlier detection. Based on a proof-of-concept case study for Denmark, targeting the imputation of missing age information in cable network asset registers, the analysis underlines the potential of generative models to support data-driven maintenance. However, the study also highlights several areas for improvement, including enhanced feature importance analysis, incorporating network characteristics and external features, and handling biases in missing data. Future initiatives should expand the application of VAEs by incorporating semi-supervised learning, advanced sampling techniques, and additional distribution grid elements, including low-voltage networks, into the analysis.
