PGLearn -- An Open-Source Learning Toolkit for Optimal Power Flow
Michael Klamkin, Mathieu Tanneau, Pascal Van Hentenryck
TL;DR
The paper addresses the lack of standardized datasets and evaluation metrics for applying machine learning to Optimal Power Flow (OPF). It introduces PGLearn, a comprehensive open-source toolkit that provides large-scale, time-series OPF datasets across AC, SOC, and DC formulations, along with complete primal and dual solutions and benchmarks. The authors also release PGLearn.jl and ML4OPF to generate data, train models, and benchmark performance, accompanied by a principled set of accuracy and computational metrics. By democratizing access to realistic, diverse OPF data and standardizing evaluation, PGLearn aims to accelerate robust ML-based approaches for real-world power systems. Together, these contributions enable fair cross-method comparisons, support dual-information research, and bridge the gap between academic ML developments and practical grid operation needs. The emphasis on large-scale, correlated demand sampling and time-series data aligns ML benchmarks with operational realities, potentially enhancing real-time decision-making and market analyses in modern grids.
Abstract
Machine Learning (ML) techniques for Optimal Power Flow (OPF) problems have recently garnered significant attention, reflecting a broader trend of leveraging ML to approximate and/or accelerate the resolution of complex optimization problems. These developments are necessitated by the increased volatility and scale in energy production for modern and future grids. However, progress in ML for OPF is hindered by the lack of standardized datasets and evaluation metrics, from generating and solving OPF instances, to training and benchmarking machine learning models. To address this challenge, this paper introduces PGLearn, a comprehensive suite of standardized datasets and evaluation tools for ML and OPF. PGLearn provides datasets that are representative of real-life operating conditions, by explicitly capturing both global and local variability in the data generation, and by, for the first time, including time series data for several large-scale systems. In addition, it supports multiple OPF formulations, including AC, DC, and second-order cone formulations. Standardized datasets are made publicly available to democratize access to this field, reduce the burden of data generation, and enable the fair comparison of various methodologies. PGLearn also includes a robust toolkit for training, evaluating, and benchmarking machine learning models for OPF, with the goal of standardizing performance evaluation across the field. By promoting open, standardized datasets and evaluation metrics, PGLearn aims at democratizing and accelerating research and innovation in machine learning applications for optimal power flow problems. Datasets are available for download at https://www.huggingface.co/PGLearn.
