Scalable Bilevel Optimization for Generating Maximally Representative OPF Datasets
Ignasi Ventura Nadal, Samuel Chevalier
TL;DR
This work tackles the challenge of generating OPF datasets that accurately reflect the operating space near system limits, where active constraints are prevalent. It introduces RAMBO, a bilevel data-collection routine that deliberately selects OPF load inputs to maximize distance from previously collected solutions, thereby sampling near-boundary regions. The approach leverages a KKT-based reformulation with a relaxed complementary-slackness constraint and incorporates scalability and temperature controls via stochastic subsets and warm-start techniques. Empirical results on IEEE 30-, 57-, and 118-bus PGLib cases show RAMBO produces richer variable ranges and a markedly higher number of unique active-constraint sets than uniform sampling, improving data representativeness for downstream ML and validation tasks. This method, with a public repository for reproducibility, advances data-driven power-system modeling by enabling more robust learning and testing of constraint-aware behaviors.
Abstract
New generations of power systems, containing high shares of renewable energy resources, require improved data-driven tools which can swiftly adapt to changes in system operation. Many of these tools, such as ones using machine learning, rely on high-quality training datasets to construct probabilistic models. Such models should be able to accurately represent the system when operating at its limits (i.e., operating with a high degree of ``active constraints"). However, generating training datasets that accurately represent the many possible combinations of these active constraints is a particularly challenging task, especially within the realm of nonlinear AC Optimal Power Flow (OPF), since most active constraints cannot be enforced explicitly. Using bilevel optimization, this paper introduces a data collection routine that sequentially solves for OPF solutions which are ``optimally far" from previously acquired voltage, power, and load profile data points. The routine, termed RAMBO, samples critical data close to a system's boundaries much more effectively than a random sampling benchmark. Simulated test results are collected on the 30-, 57-, and 118-bus PGLib test cases.
