Table of Contents
Fetching ...

Anomaly Detection with Machine Learning Algorithms in Large-Scale Power Grids

Marc Gillioz, Guillaume Dubuis, Étienne Voutaz, Philippe Jacquod

TL;DR

The study tackles fast and reliable anomaly detection in multivariate time-series from large high-voltage grids, focusing on contextual anomalies that depend on grid-wide context. It compares nine ML algorithms—across supervised (false data injection) and unsupervised (forecast-based) approaches—and evaluates them on open-model continental Europe grids (Switzerland, Spain, Germany) with hourly data spanning 20 years. Neural-network-based models, including gradient-boosted trees and LSTMs, consistently outperform classical methods, while unsupervised predictors achieve high predictive accuracy ($R^2$ typically $\gtrsim 0.95$) and competitive $F_2$-scores, even under multiple concurrent attacks. The results suggest that incorporating a modest history (≈24 time steps) and a compact contemporaneous context yields robust anomaly detection with manageable computational demands, supporting practical deployment for real-time grid security.

Abstract

We apply several machine learning algorithms to the problem of anomaly detection in operational data for large-scale, high-voltage electric power grids. We observe important differences in the performance of the algorithms. Neural networks typically outperform classical algorithms such as k-nearest neighbors and support vector machines, which we explain by the strong contextual nature of the anomalies. We show that unsupervised learning algorithm work remarkably well and that their predictions are robust against simultaneous, concurring anomalies.

Anomaly Detection with Machine Learning Algorithms in Large-Scale Power Grids

TL;DR

The study tackles fast and reliable anomaly detection in multivariate time-series from large high-voltage grids, focusing on contextual anomalies that depend on grid-wide context. It compares nine ML algorithms—across supervised (false data injection) and unsupervised (forecast-based) approaches—and evaluates them on open-model continental Europe grids (Switzerland, Spain, Germany) with hourly data spanning 20 years. Neural-network-based models, including gradient-boosted trees and LSTMs, consistently outperform classical methods, while unsupervised predictors achieve high predictive accuracy ( typically ) and competitive -scores, even under multiple concurrent attacks. The results suggest that incorporating a modest history (≈24 time steps) and a compact contemporaneous context yields robust anomaly detection with manageable computational demands, supporting practical deployment for real-time grid security.

Abstract

We apply several machine learning algorithms to the problem of anomaly detection in operational data for large-scale, high-voltage electric power grids. We observe important differences in the performance of the algorithms. Neural networks typically outperform classical algorithms such as k-nearest neighbors and support vector machines, which we explain by the strong contextual nature of the anomalies. We show that unsupervised learning algorithm work remarkably well and that their predictions are robust against simultaneous, concurring anomalies.
Paper Structure (13 sections, 1 equation, 9 figures, 1 table)

This paper contains 13 sections, 1 equation, 9 figures, 1 table.

Figures (9)

  • Figure 1: The three transmission power grids considered in this work: (a) Switzerland, (b) Spain, and (c) Germany. The dots represent power plants directly connected to the transmission grid, among which the blue ones are those on which we study anomalies.
  • Figure 2: Examples of synthetic time series for two hydroelectric power plant, as taken from the dataset zenodo. On/off anomalies added to the data are shown with a red, dotted line. The prediction of the unsupervised MLPR algorithm is shown with the green, dashed line. The green band surrounding the prediction indicates how much it can deviate if a concurring attack happens on another generator in the same country.
  • Figure 3: $F_2$ score for the 7 classifier algorithms on the test set. In each case the median value over all the selected power plants is represented by the red line, surrounded by a box representing the first and third quartile, and whiskers at the minimum and maximum. Isolated outliers are denoted with circles.
  • Figure 4: $F_2$ score for the 5 best algorithms on the test set, including the two unsupervised algorithms. Boxes and whiskers are as in Fig. \ref{['fig:F2:classifiers']}.
  • Figure 5: Precision (one minus rate of false positives) and recall (false negatives) for the five best performing algorithms on all 33 selected power plants of the three national grids, in each case with the best choice of hyperparameters. The circle/crosses correspond respectively to supervised/unsupervised algorithms.
  • ...and 4 more figures