Table of Contents
Fetching ...

Impact of Data Sparsity on Machine Learning for Fault Detection in Power System Protection

Julian Oelhaf, Georg Kordowich, Changhun Kim, Paula Andrea Perez-Toro, Andreas Maier, Johann Jager, Siming Bayer

TL;DR

This work addresses how data sparsity—from sensor failures, reduced sampling, and communication disruptions—affects ML-based fault detection and fault line identification in power grids. It introduces a validation framework built on simulated grid data (Double Line topology) and a Random Forest pipeline, evaluating FD (binary) and FLI (four-class) under diverse sparsity scenarios. The findings show FD is robust to data sparsity, even with substantially reduced sampling, while FLI is more sensitive to missing voltage data, Bus 2 information loss, and extended communication outages, highlighting different reliability needs for these protection tasks. The framework enables targeted improvements in protection schemes and lays groundwork for extending validation to other fault types and topologies.

Abstract

Germany's transition to a renewable energy-based power system is reshaping grid operations, requiring advanced monitoring and control to manage decentralized generation. Machine learning (ML) has emerged as a powerful tool for power system protection, particularly for fault detection (FD) and fault line identification (FLI) in transmission grids. However, ML model reliability depends on data quality and availability. Data sparsity resulting from sensor failures, communication disruptions, or reduced sampling rates poses a challenge to ML-based FD and FLI. Yet, its impact has not been systematically validated prior to this work. In response, we propose a framework to assess the impact of data sparsity on ML-based FD and FLI performance. We simulate realistic data sparsity scenarios, evaluate their impact, derive quantitative insights, and demonstrate the effectiveness of this evaluation strategy by applying it to an existing ML-based framework. Results show the ML model remains robust for FD, maintaining an F1-score of 0.999 $\pm$ 0.000 even after a 50x data reduction. In contrast, FLI is more sensitive, with performance decreasing by 55.61% for missing voltage measurements and 9.73% due to communication failures at critical network points. These findings offer actionable insights for optimizing ML models for real-world grid protection. This enables more efficient FD and supports targeted improvements in FLI.

Impact of Data Sparsity on Machine Learning for Fault Detection in Power System Protection

TL;DR

This work addresses how data sparsity—from sensor failures, reduced sampling, and communication disruptions—affects ML-based fault detection and fault line identification in power grids. It introduces a validation framework built on simulated grid data (Double Line topology) and a Random Forest pipeline, evaluating FD (binary) and FLI (four-class) under diverse sparsity scenarios. The findings show FD is robust to data sparsity, even with substantially reduced sampling, while FLI is more sensitive to missing voltage data, Bus 2 information loss, and extended communication outages, highlighting different reliability needs for these protection tasks. The framework enables targeted improvements in protection schemes and lays groundwork for extending validation to other fault types and topologies.

Abstract

Germany's transition to a renewable energy-based power system is reshaping grid operations, requiring advanced monitoring and control to manage decentralized generation. Machine learning (ML) has emerged as a powerful tool for power system protection, particularly for fault detection (FD) and fault line identification (FLI) in transmission grids. However, ML model reliability depends on data quality and availability. Data sparsity resulting from sensor failures, communication disruptions, or reduced sampling rates poses a challenge to ML-based FD and FLI. Yet, its impact has not been systematically validated prior to this work. In response, we propose a framework to assess the impact of data sparsity on ML-based FD and FLI performance. We simulate realistic data sparsity scenarios, evaluate their impact, derive quantitative insights, and demonstrate the effectiveness of this evaluation strategy by applying it to an existing ML-based framework. Results show the ML model remains robust for FD, maintaining an F1-score of 0.999 0.000 even after a 50x data reduction. In contrast, FLI is more sensitive, with performance decreasing by 55.61% for missing voltage measurements and 9.73% due to communication failures at critical network points. These findings offer actionable insights for optimizing ML models for real-world grid protection. This enables more efficient FD and supports targeted improvements in FLI.

Paper Structure

This paper contains 9 sections, 2 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Schematic of the "Double Line" grid topology. The diagram illustrates the transmission lines, pr, and communication devices. Measurements are recorded at the pr and transmitted to the substations at each bus. From there, the data is relayed to the control center for fd and fli.
  • Figure 2: Heatmap of the Fault Line Identification task F1-scores under cases of temporal communication loss.