Proximal Policy Optimization with Graph Neural Networks for Optimal Power Flow

Ángela López-Cardona; Guillermo Bernárdez; Pere Barlet-Ros; Albert Cabellos-Aparicio

Proximal Policy Optimization with Graph Neural Networks for Optimal Power Flow

Ángela López-Cardona, Guillermo Bernárdez, Pere Barlet-Ros, Albert Cabellos-Aparicio

TL;DR

This work tackles real-time ACOPF in large, nonlinear power systems by integrating Graph Neural Networks with Proximal Policy Optimization to form a PPO+GNN agent. The Actor-GNN selects generator adjustments while the Critic-GNN evaluates states, operating on a graph representation of the power grid where edges carry line parameters and nodes carry bus data. Trained on the IEEE 30-bus system, the approach demonstrates strong generalization to unseen topologies and achieves generation-cost improvements comparable to or better than DCOPF, with reductions up to around 30% in some scenarios. This combination offers a scalable, topology-aware DRL-based OPF solver that can adapt to topology changes common in real-world grids, signaling significant practical impact for real-time grid optimization and planning.

Abstract

Optimal Power Flow (OPF) is a very traditional research area within the power systems field that seeks for the optimal operation point of electric power plants, and which needs to be solved every few minutes in real-world scenarios. However, due to the nonconvexities that arise in power generation systems, there is not yet a fast, robust solution technique for the full Alternating Current Optimal Power Flow (ACOPF). In the last decades, power grids have evolved into a typical dynamic, non-linear and large-scale control system, known as the power system, so searching for better and faster ACOPF solutions is becoming crucial. Appearance of Graph Neural Networks (GNN) has allowed the natural use of Machine Learning (ML) algorithms on graph data, such as power networks. On the other hand, Deep Reinforcement Learning (DRL) is known for its powerful capability to solve complex decision-making problems. Although solutions that use these two methods separately are beginning to appear in the literature, none has yet combined the advantages of both. We propose a novel architecture based on the Proximal Policy Optimization algorithm with Graph Neural Networks to solve the Optimal Power Flow. The objective is to design an architecture that learns how to solve the optimization problem and that is at the same time able to generalize to unseen scenarios. We compare our solution with the DCOPF in terms of cost after having trained our DRL agent on IEEE 30 bus system and then computing the OPF on that base network with topology changes

Proximal Policy Optimization with Graph Neural Networks for Optimal Power Flow

TL;DR

Abstract

Paper Structure (11 sections, 3 equations, 3 figures, 3 tables)

This paper contains 11 sections, 3 equations, 3 figures, 3 tables.

INTRODUCTION
RELATED WORK
PROBLEM STATEMENT
BACKGROUND
Graph Neural Networks
Deep Reinforcement Learning
PROPOSED METHOD
PERFORMANCE EVALUATION
Experimental Setting
Experimental Results
DISCUSSION

Figures (3)

Figure 1: Overview of the PPO-based architecture for power grid optimization. The system consists of an environment and an agent. The environment simulates a power grid case. The agent, implemented using PPO with GNN, consists of an actor-critic structure: the Actor-GNN selects actions, while the Critic-GNN evaluates state values. The agent interacts with the environment by receiving state information, actions (change generation), and rewards based on the computed cost.
Figure 2: gnn architecture. Both critic and actor employ the same gnn architecture, differing only in their readout layers. The process initiates with the preparation of initial node representations, leveraging both node and edge features. Specifically, the GNN's input comprises the electrical parameters of the grid. During the message-passing phase (repeated k times), each node generates messages based on its features, which are subsequently aggregated from its neighbors. These aggregated messages refine the node representations via an update function. Ultimately, in the readout phase, the actor utilizes these refined representations to compute the action, while the critic uses them to estimate the value function.
Figure 3: IEEE 30 bus system pandapower2.

Proximal Policy Optimization with Graph Neural Networks for Optimal Power Flow

TL;DR

Abstract

Proximal Policy Optimization with Graph Neural Networks for Optimal Power Flow

Authors

TL;DR

Abstract

Table of Contents

Figures (3)