Table of Contents
Fetching ...

PowerGraph: A power grid benchmark dataset for graph neural networks

Anna Varbella, Kenza Amara, Blazhe Gjorgiev, Mennatallah El-Assady, Giovanni Sansavini

TL;DR

Overall, PowerGraph is a multifaceted GNN dataset for diverse tasks that includes power flow and fault scenarios with real-world explanations, providing a valuable resource for developing improved GNN models for node-level, graph-level tasks and explainability methods in power system modeling.

Abstract

Power grids are critical infrastructures of paramount importance to modern society and, therefore, engineered to operate under diverse conditions and failures. The ongoing energy transition poses new challenges for the decision-makers and system operators. Therefore, developing grid analysis algorithms is important for supporting reliable operations. These key tools include power flow analysis and system security analysis, both needed for effective operational and strategic planning. The literature review shows a growing trend of machine learning (ML) models that perform these analyses effectively. In particular, Graph Neural Networks (GNNs) stand out in such applications because of the graph-based structure of power grids. However, there is a lack of publicly available graph datasets for training and benchmarking ML models in electrical power grid applications. First, we present PowerGraph, which comprises GNN-tailored datasets for i) power flows, ii) optimal power flows, and iii) cascading failure analyses of power grids. Second, we provide ground-truth explanations for the cascading failure analysis. Finally, we perform a complete benchmarking of GNN methods for node-level and graph-level tasks and explainability. Overall, PowerGraph is a multifaceted GNN dataset for diverse tasks that includes power flow and fault scenarios with real-world explanations, providing a valuable resource for developing improved GNN models for node-level, graph-level tasks and explainability methods in power system modeling. The dataset is available at https://figshare.com/articles/dataset/PowerGraph/22820534 and the code at https://github.com/PowerGraph-Datasets.

PowerGraph: A power grid benchmark dataset for graph neural networks

TL;DR

Overall, PowerGraph is a multifaceted GNN dataset for diverse tasks that includes power flow and fault scenarios with real-world explanations, providing a valuable resource for developing improved GNN models for node-level, graph-level tasks and explainability methods in power system modeling.

Abstract

Power grids are critical infrastructures of paramount importance to modern society and, therefore, engineered to operate under diverse conditions and failures. The ongoing energy transition poses new challenges for the decision-makers and system operators. Therefore, developing grid analysis algorithms is important for supporting reliable operations. These key tools include power flow analysis and system security analysis, both needed for effective operational and strategic planning. The literature review shows a growing trend of machine learning (ML) models that perform these analyses effectively. In particular, Graph Neural Networks (GNNs) stand out in such applications because of the graph-based structure of power grids. However, there is a lack of publicly available graph datasets for training and benchmarking ML models in electrical power grid applications. First, we present PowerGraph, which comprises GNN-tailored datasets for i) power flows, ii) optimal power flows, and iii) cascading failure analyses of power grids. Second, we provide ground-truth explanations for the cascading failure analysis. Finally, we perform a complete benchmarking of GNN methods for node-level and graph-level tasks and explainability. Overall, PowerGraph is a multifaceted GNN dataset for diverse tasks that includes power flow and fault scenarios with real-world explanations, providing a valuable resource for developing improved GNN models for node-level, graph-level tasks and explainability methods in power system modeling. The dataset is available at https://figshare.com/articles/dataset/PowerGraph/22820534 and the code at https://github.com/PowerGraph-Datasets.
Paper Structure (42 sections, 5 equations, 8 figures, 9 tables)

This paper contains 42 sections, 5 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Instance of the PowerGraph dataset for power flow and optimal power flow. The input node features are in red, and output node-level predictions are in green. The known input quantities are reported for different node types, and the unknown quantities are set to zero to maintain the dataset's dimensionality structure (indicated by an empty cell in the picture). Similarly, the output quantities depend on the node type; if a variable is known, we mask it during training, and masked values are indicated with grey cells. The quantities are: active power generation $P_{g}$, reactive power generation $Q_{g}$, active power demand $P_{d}$, reactive power demand $Q_{d}$, voltage magnitude $V$ , and voltage angle $\theta$, the number of loads $N_{loads}$, and number of generators $N_{gen}$. The edge level features are: branch conductance $G_{ij}$ and branch susceptance $B_{ij}$.
  • Figure 2: Instance of the PowerGraph dataset for cascading failure analysis. We highlight the initial outage with the red-dotted line, which is removed from the graph connectivity matrix and from the edge feature matrix. The cascading edge is in bold and encoded in the M boolean vector (0 - the edge has not tripped during cascading development, 1 - otherwise). The input node features are the: net active power $P_{net} = P_{gen}-P_{load}$, net apparent power $S_{net} = S_{gen}-S_{load}$, and voltage magnitude $V_i$. Where $P_{gen}$ and $P_{load}$ are the active power generation and demand, respectively, and $S_{gen}$ and $S_{load}$ are the apparent power generation and demand, respectively. The input edge features are: active power flow $P_{i,j}$, reactive power flow $Q_{i,j}$, line reactance $X_{i,j}$, and line rating $lr_{i,j}$.
  • Figure 3: Node-averaged Mean Absolute Errors on the predicted physical quantities for the power flow and optimal power flow problems on the best performing models reported in Table \ref{['tab:resnodelevel']}.
  • Figure 4: Balanced accuracy of the explanations with $topk^{*}$ edges. The top balanced accuracy is computed on explanatory edge masks that contain the $topk^{*}$ edges that contribute the most to the model predictions, with $topk^{*}$ being the number of edges in the corresponding ground-truth explanations, i.e. the maximum number of cascading edges for each dataset.
  • Figure 5: Faithfulness of the explanations with $topk^{*}$ edges. The faithfulness score is measured with the $fid+^{acc}$ metric as defined in Equation \ref{['eq:fidelity1']} in Appendix \ref{['apx:fid']}. The optimal number $topk^{*}$ of edges kept for the explanations corresponds to the maximum number of expected cascading edges (i.e., ground truth explanations) and depends on the dataset.
  • ...and 3 more figures