Table of Contents
Fetching ...

Unsupervised Clustering for Fault Analysis in High-Voltage Power Systems Using Voltage and Current Signals

Julian Oelhaf, Georg Kordowich, Andreas Maier, Johann Jager, Siming Bayer

TL;DR

This work tackles fault analysis in high-voltage power systems with unlabeled voltage-current waveforms by applying FFT-based frequency-domain features, dimensionality reduction via PCA and t-SNE, and clustering with K-Means on a real RTE dataset. The approach yields a scalable, data-driven workflow that partially aligns clusters with known fault types via expert labeling, despite the absence of extensive ground truth. Key contributions include a pronounced reduction of labeling burden, a comparative assessment of PCA- versus t-SNE-driven clustering, and insights into the suitability of unsupervised methods for fault-pattern discovery in large sensor datasets. The findings underscore the potential of unsupervised fault analysis to streamline inspection and support grid protection decisions, while acknowledging limitations and opportunities for improved validation and clustering methods.

Abstract

The widespread use of sensors in modern power grids has led to the accumulation of large amounts of voltage and current waveform data, especially during fault events. However, the lack of labeled datasets poses a significant challenge for fault classification and analysis. This paper explores the application of unsupervised clustering techniques for fault diagnosis in high-voltage power systems. A dataset provided by the Reseau de Transport d'Electricite (RTE) is analyzed, with frequency domain features extracted using the Fast Fourier Transform (FFT). The K-Means algorithm is then applied to identify underlying patterns in the data, enabling automated fault categorization without the need for labeled training samples. The resulting clusters are evaluated in collaboration with power system experts to assess their alignment with real-world fault characteristics. The results demonstrate the potential of unsupervised learning for scalable and data-driven fault analysis, providing a robust approach to detecting and classifying power system faults with minimal prior assumptions.

Unsupervised Clustering for Fault Analysis in High-Voltage Power Systems Using Voltage and Current Signals

TL;DR

This work tackles fault analysis in high-voltage power systems with unlabeled voltage-current waveforms by applying FFT-based frequency-domain features, dimensionality reduction via PCA and t-SNE, and clustering with K-Means on a real RTE dataset. The approach yields a scalable, data-driven workflow that partially aligns clusters with known fault types via expert labeling, despite the absence of extensive ground truth. Key contributions include a pronounced reduction of labeling burden, a comparative assessment of PCA- versus t-SNE-driven clustering, and insights into the suitability of unsupervised methods for fault-pattern discovery in large sensor datasets. The findings underscore the potential of unsupervised fault analysis to streamline inspection and support grid protection decisions, while acknowledging limitations and opportunities for improved validation and clustering methods.

Abstract

The widespread use of sensors in modern power grids has led to the accumulation of large amounts of voltage and current waveform data, especially during fault events. However, the lack of labeled datasets poses a significant challenge for fault classification and analysis. This paper explores the application of unsupervised clustering techniques for fault diagnosis in high-voltage power systems. A dataset provided by the Reseau de Transport d'Electricite (RTE) is analyzed, with frequency domain features extracted using the Fast Fourier Transform (FFT). The K-Means algorithm is then applied to identify underlying patterns in the data, enabling automated fault categorization without the need for labeled training samples. The resulting clusters are evaluated in collaboration with power system experts to assess their alignment with real-world fault characteristics. The results demonstrate the potential of unsupervised learning for scalable and data-driven fault analysis, providing a robust approach to detecting and classifying power system faults with minimal prior assumptions.

Paper Structure

This paper contains 19 sections, 11 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: t-SNE visualization of clustering results in two-dimensional space. Each point represents a sample, with colors indicating clusters identified by K-Means. This visualization provides insight into the cluster distribution and separability of fault events.
  • Figure 2: Distribution of the event types in the manually labeled subset (N = 204).
  • Figure 3: Percentage distribution of event types per K-Means PCA cluster in the manually labeled subset (N = 204). Each cluster sums to $100\%$.
  • Figure 4: Percentage distribution of event types per K-Means t-SNE cluster in the manually labeled subset ($N=204$). Each cluster sums to $100\%$.
  • Figure 5: Percentage distribution of fault classes per K-Means PCA cluster in the manually labeled subset ($N=200$). Each cluster sums to $100\%$.
  • ...and 4 more figures