Table of Contents
Fetching ...

A large synthetic dataset for machine learning applications in power transmission grids

Marc Gillioz, Guillaume Dubuis, Philippe Jacquod

TL;DR

This manuscript presents a large synthetic dataset of power injections in an electric transmission grid model of continental Europe, and describes the algorithm developed for its generation, which allows one to generate arbitrarily large time series.

Abstract

With the ongoing energy transition, power grids are evolving fast. They operate more and more often close to their technical limit, under more and more volatile conditions. Fast, essentially real-time computational approaches to evaluate their operational safety, stability and reliability are therefore highly desirable. Machine Learning methods have been advocated to solve this challenge, however they are heavy consumers of training and testing data, while historical operational data for real-world power grids are hard if not impossible to access. This manuscript presents a large synthetic dataset of power injections in an electric transmission grid model of continental Europe, and describes the algorithm developed for its generation. The method allows one to generate arbitrarily large time series from the knowledge of the grid -- the admittance of its lines as well as the location, type and capacity of its power generators -- and aggregated power consumption data, such as the national load data given by ENTSO-E. The obtained datasets are statistically validated against real-world data.

A large synthetic dataset for machine learning applications in power transmission grids

TL;DR

This manuscript presents a large synthetic dataset of power injections in an electric transmission grid model of continental Europe, and describes the algorithm developed for its generation, which allows one to generate arbitrarily large time series.

Abstract

With the ongoing energy transition, power grids are evolving fast. They operate more and more often close to their technical limit, under more and more volatile conditions. Fast, essentially real-time computational approaches to evaluate their operational safety, stability and reliability are therefore highly desirable. Machine Learning methods have been advocated to solve this challenge, however they are heavy consumers of training and testing data, while historical operational data for real-world power grids are hard if not impossible to access. This manuscript presents a large synthetic dataset of power injections in an electric transmission grid model of continental Europe, and describes the algorithm developed for its generation. The method allows one to generate arbitrarily large time series from the knowledge of the grid -- the admittance of its lines as well as the location, type and capacity of its power generators -- and aggregated power consumption data, such as the national load data given by ENTSO-E. The obtained datasets are statistically validated against real-world data.
Paper Structure (14 sections, 7 equations, 8 figures, 3 tables)

This paper contains 14 sections, 7 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: The PanTaGruEl model of the transmission grid of continental Europe.PanTaGruEl Lines are colored by voltage. Buses with generators attached are indicated with a square.
  • Figure 2: Historical time series for the three active Swiss nuclear power plants in 2020, obtained from the ENTSO-E transparency platform. Nuclear generators mostly work at their nominal regime, except for periods of planned maintenance (mostly in summer, and alternating between plants) and rare unexpected events. The Gösgen and Leibstadt power plants have a single reactor, while the Beznau power plant has two reactors, which explains why the power output is about half the nominal value during maintenance.
  • Figure 3: Distribution of Pearson correlation coefficients between pairs of synthetic time series, compared with real-world measurements for the Swiss transmission grid during the month of July 2015.
  • Figure 4: Total load for Switzerland during the fourth week of the year, with hourly resolution. The top panel shows historical data for the years 2019 to 2022. The lower panel shows synthetic series generated using a statistical model built from historical data. The same pattern is clearly visible in both panels: the five working days have higher load and two peaks in the morning and evening whereas the weekend is characterized by a lower consumption.
  • Figure 5: Time covariance matrix for the historical series shown in the top panel of Fig. \ref{['fig:sample-time-series']} (left), and for their Fourier transform (right). Off-diagonal elements are larger, with clear structures reflecting periodicities for the time covariance matrix; they are strongly suppressed for its Fourier transform.
  • ...and 3 more figures