Table of Contents
Fetching ...

PGLearn -- An Open-Source Learning Toolkit for Optimal Power Flow

Michael Klamkin, Mathieu Tanneau, Pascal Van Hentenryck

TL;DR

The paper addresses the lack of standardized datasets and evaluation metrics for applying machine learning to Optimal Power Flow (OPF). It introduces PGLearn, a comprehensive open-source toolkit that provides large-scale, time-series OPF datasets across AC, SOC, and DC formulations, along with complete primal and dual solutions and benchmarks. The authors also release PGLearn.jl and ML4OPF to generate data, train models, and benchmark performance, accompanied by a principled set of accuracy and computational metrics. By democratizing access to realistic, diverse OPF data and standardizing evaluation, PGLearn aims to accelerate robust ML-based approaches for real-world power systems. Together, these contributions enable fair cross-method comparisons, support dual-information research, and bridge the gap between academic ML developments and practical grid operation needs. The emphasis on large-scale, correlated demand sampling and time-series data aligns ML benchmarks with operational realities, potentially enhancing real-time decision-making and market analyses in modern grids.

Abstract

Machine Learning (ML) techniques for Optimal Power Flow (OPF) problems have recently garnered significant attention, reflecting a broader trend of leveraging ML to approximate and/or accelerate the resolution of complex optimization problems. These developments are necessitated by the increased volatility and scale in energy production for modern and future grids. However, progress in ML for OPF is hindered by the lack of standardized datasets and evaluation metrics, from generating and solving OPF instances, to training and benchmarking machine learning models. To address this challenge, this paper introduces PGLearn, a comprehensive suite of standardized datasets and evaluation tools for ML and OPF. PGLearn provides datasets that are representative of real-life operating conditions, by explicitly capturing both global and local variability in the data generation, and by, for the first time, including time series data for several large-scale systems. In addition, it supports multiple OPF formulations, including AC, DC, and second-order cone formulations. Standardized datasets are made publicly available to democratize access to this field, reduce the burden of data generation, and enable the fair comparison of various methodologies. PGLearn also includes a robust toolkit for training, evaluating, and benchmarking machine learning models for OPF, with the goal of standardizing performance evaluation across the field. By promoting open, standardized datasets and evaluation metrics, PGLearn aims at democratizing and accelerating research and innovation in machine learning applications for optimal power flow problems. Datasets are available for download at https://www.huggingface.co/PGLearn.

PGLearn -- An Open-Source Learning Toolkit for Optimal Power Flow

TL;DR

The paper addresses the lack of standardized datasets and evaluation metrics for applying machine learning to Optimal Power Flow (OPF). It introduces PGLearn, a comprehensive open-source toolkit that provides large-scale, time-series OPF datasets across AC, SOC, and DC formulations, along with complete primal and dual solutions and benchmarks. The authors also release PGLearn.jl and ML4OPF to generate data, train models, and benchmark performance, accompanied by a principled set of accuracy and computational metrics. By democratizing access to realistic, diverse OPF data and standardizing evaluation, PGLearn aims to accelerate robust ML-based approaches for real-world power systems. Together, these contributions enable fair cross-method comparisons, support dual-information research, and bridge the gap between academic ML developments and practical grid operation needs. The emphasis on large-scale, correlated demand sampling and time-series data aligns ML benchmarks with operational realities, potentially enhancing real-time decision-making and market analyses in modern grids.

Abstract

Machine Learning (ML) techniques for Optimal Power Flow (OPF) problems have recently garnered significant attention, reflecting a broader trend of leveraging ML to approximate and/or accelerate the resolution of complex optimization problems. These developments are necessitated by the increased volatility and scale in energy production for modern and future grids. However, progress in ML for OPF is hindered by the lack of standardized datasets and evaluation metrics, from generating and solving OPF instances, to training and benchmarking machine learning models. To address this challenge, this paper introduces PGLearn, a comprehensive suite of standardized datasets and evaluation tools for ML and OPF. PGLearn provides datasets that are representative of real-life operating conditions, by explicitly capturing both global and local variability in the data generation, and by, for the first time, including time series data for several large-scale systems. In addition, it supports multiple OPF formulations, including AC, DC, and second-order cone formulations. Standardized datasets are made publicly available to democratize access to this field, reduce the burden of data generation, and enable the fair comparison of various methodologies. PGLearn also includes a robust toolkit for training, evaluating, and benchmarking machine learning models for OPF, with the goal of standardizing performance evaluation across the field. By promoting open, standardized datasets and evaluation metrics, PGLearn aims at democratizing and accelerating research and innovation in machine learning applications for optimal power flow problems. Datasets are available for download at https://www.huggingface.co/PGLearn.

Paper Structure

This paper contains 43 sections, 14 equations, 2 figures, 7 tables, 1 algorithm.

Figures (2)

  • Figure 1: The PGLearn Toolkit: publicly available AC, DC, and SOC optimal power flow datasets, PGLearn.jl for data generation, and the ML4OPF ML toolkit.
  • Figure 2: Limitations of sampling strategies that do not consider correlations across individual loads. Left: histogram of total demand: the absence of correlations yields a narrow range of total demand. Right: active power flow on branch 200; the absence of correlations in input data leads to datasets with low variance and diversity.