PolyFormer: learning efficient reformulations for scalable optimization under complex physical constraints

Yilin Wen; Yi Guo; Bo Zhao; Wei Qi; Zechun Hu; Colin Jones; Jian Sun

PolyFormer: learning efficient reformulations for scalable optimization under complex physical constraints

Yilin Wen, Yi Guo, Bo Zhao, Wei Qi, Zechun Hu, Colin Jones, Jian Sun

TL;DR

PolyFormer is introduced, which opens a new direction for PIML in prescriptive optimization tasks, where physical and geometric knowledge is not merely used to regularize learning models, but to simplify the problems themselves and decoupling problem complexity from solution difficulty.

Abstract

Real-world optimization problems are often constrained by complex physical laws that limit computational scalability. These constraints are inherently tied to complex regions, and thus learning models that incorporate physical and geometric knowledge, i.e., physics-informed machine learning (PIML), offer a promising pathway for efficient solution. Here, we introduce PolyFormer, which opens a new direction for PIML in prescriptive optimization tasks, where physical and geometric knowledge is not merely used to regularize learning models, but to simplify the problems themselves. PolyFormer captures geometric structures behind constraints and transforms them into efficient polytopic reformulations, thereby decoupling problem complexity from solution difficulty and enabling off-the-shelf optimization solvers to efficiently produce feasible solutions with acceptable optimality loss. Through evaluations across three important problems (large-scale resource aggregation, network-constrained optimization, and optimization under uncertainty), PolyFormer achieves computational speedups up to 6,400-fold and memory reductions up to 99.87%, while maintaining solution quality competitive with or superior to state-of-the-art methods. These results demonstrate that PolyFormer provides an efficient and reliable solution for scalable constrained optimization, expanding the scope of PIML to prescriptive tasks in scientific discovery and engineering applications.

PolyFormer: learning efficient reformulations for scalable optimization under complex physical constraints

TL;DR

Abstract

Paper Structure (15 sections, 16 equations, 6 figures)

This paper contains 15 sections, 16 equations, 6 figures.

Introduction
Results
Overview of PolyFormer
Aggregation of large-scale resources
Network-constrained optimization
Optimization under uncertainty
Discussion
Methods
Error metrics
Loss function and gradients
Implementation remarks
Typical geometries
Apply PolyFormer to resource aggregation
Apply PolyFormer to the two-layer power system optimization
Apply PolyFormer to DRCC portfolio optimization

Figures (6)

Figure 1: Capability and training procedure of PolyFormer.a, PolyFormer solutions under three classes of complex physical constraints: large-scale per-individual constraints, network constraints, and uncertainty-related constraints. b, Training pipeline of PolyFormer. The procedure comprises six steps: (1) random sampling of a direction $\mathbf{v}$; (2) computation of the directional feasibility and optimality errors, $e_{\mathrm{feas}}(\mathbf{v})$ and $e_{\mathrm{opt}}(\mathbf{v})$, together with the associated boundary points $\mathbf{x}^{\star}$, $\mathbf{z}'$, $\mathbf{z}^{\star}$ and $\mathbf{x}'$ (see "Error metrics" section in the Methods); (3) identification of active constraints in the current polytope at $\mathbf{x}^{\star}$ and $\mathbf{x}'$; (4) evaluation of the loss based on the distance from $\mathbf{z}$ to the hyperplanes defined by the active constraints; (5) gradient computation via automatic differentiation (see "Loss function and gradients" sections in the Methods); and (6) parameter updates of $\mathbf{A}$ and $\mathbf{b}$ using gradient-based optimization. c, Training of the parameterized PolyFormer additionally involves sampling a parameter $\boldsymbol{\theta}$ and mapping it to $\mathbf{A}(\boldsymbol{\theta})$ and $\mathbf{b}(\boldsymbol{\theta})$ using two neural networks, denoted $\mathbf{A}$-net and $\mathbf{b}$-net, with trainable weights $\boldsymbol{w}_{\mathbf{A}}$ and $\boldsymbol{w}_{\mathbf{b}}$, respectively.
Figure 2: Training process and benchmark results for large-scale resource aggregation. a-b, Evolution of the feasibility and optimality error distributions during PolyFormer training for two scenarios: (a) 1,000 resources with continuous controls and (b) 105 resources with mixed continuous-discrete controls. The error distributions are obtained by randomly sampling 50 directions and computing the directional errors according to \ref{['eq:dir_err_feas']} and \ref{['eq:dir_err_opt']}. Training proceeds in three phases. In the balancing phase (iterations 1-500, $\lambda = 0.5$), both errors decrease as they compete with each other, and the polytope adapts to the true aggregated region. The refining phase (iterations 501-700, $\lambda = 0.9$) prioritizes feasibility while considering optimality to a lesser extent, reducing the average feasibility error to $4.3\times10^{-4}$ and $2.4\times10^{-3}$, respectively. The converging phase (iterations 701-800, $\lambda = 0.9999$) drives feasibility errors to numerical zero (average $7.1\times10^{-15}$ and $1.1\times10^{-5}$, respectively), enforcing an inner approximation. The total training time is 124 min (a) and 65 min (b). c, Modeling efficiency comparison among the full model, Box, Homothet, and PolyFormer in scenario (a), plotting optimality error (conservatism) versus number of constraints (complexity). The box plots show the distribution of optimality errors. Background shading from purple to yellow indicates decreasing efficiency in balancing complexity and conservatism, with the lower-left region most desirable. PolyFormer attains the best trade-off between complexity and conservatism. d, Complexity reduction in scenario (b): PolyFormer removes all 336 binary variables and reduces continuous variables from 5,184 to 24 and constraints from 6,291 to 96.
Figure 3: Benchmark results of the two-layer power system optimization problem. a, Schematic diagram of the two-layer power system. The upper level transmission network includes generation units powered by various energy sources, transmitting electricity to substations. Then the distribution network distributes electricity to various users. b, Comparison of solver time and peak memory usage between optimization using the simplified PolyFormer model and optimization of the original full model solved by IPOPT. Each data point represents a transmission-distribution system case, with the size of the point indicating the case's complexity (the total number of transmission and distribution nodes). Each red-blue pair represents the same case, with the connection from the upper-right red points to the lower-left blue points illustrating that PolyFormer consistently reduces both computation time and memory usage compared to directly solving the full problem with IPOPT. The largest case is also annotated in the figure: solving the original model takes 1,476 seconds and 821 MB of memory, while the PolyFormer-simplified model only requires 0.23 seconds and 3.50 MB of memory. c, Maximum feasibility error and objective error of two PolyFormer variants. All error values are normalized by dividing by the sum of squares of the baseline active and reactive powers of each distribution system. "Moderate PolyFormer" ($\lambda = 0.625$) yields an average maximum feasibility error of $2.6 \times 10^{-6}$ and an average objective error of $3.7 \times 10^{-5}$. Two points show negative objective errors, indicating that the PolyFormer-simplified model yields lower costs than the original model. These points also have larger feasibility errors. "Feasible PolyFormer" ($\lambda = 0.99$) has all of the maximum feasibility errors below $10^{-8}$, with an average of $6.4 \times 10^{-10}$. However, the average objective error is $7.2 \times 10^{-4}$, larger than that of Moderate PolyFormer, suggesting that the Feasible PolyFormer approach is more conservative, as expected.
Figure 4: Benchmark results of DRCC portfolio optimization across four case settings. a, 50 assets, 2 groups, 150 samples. b, 150 assets, 3 groups, 300 samples. c, 300 assets, 5 groups, 900 samples. d, 400 assets, 8 groups, 1280 samples. Each scatter plot shows the return rate versus average constraint violation for 300 portfolio strategies solved using the DRCC-linear and PolyFormer models. Data points represent the average constraint violation and actual return of a portfolio strategy evaluated on a test dataset of 100 uncertain return samples (see Supplementary Note 4). The strategies are derived from varying parameters (risk level, Wasserstein ball radius, maximum group investment cap, and minimum acceptable return per group), as described in the "Apply PolyFormer to DRCC portfolio optimization" section in the Methods. Contours, averages, and Pareto fronts are annotated in the plots. Solutions with higher returns and lower errors (i.e., located towards the bottom-right) are considered better. Notably, in all four cases, PolyFormer consistently finds a large number of non-inferior solutions compared to DRCC-linear in the bottom-left area, while also identifying superior solutions in the bottom-right area, indicating the high quality of the solutions produced by PolyFormer. The bar charts in each subfigure compare the number of variables and constraints for the two methods in each case. PolyFormer significantly reduces the number of variables (by 99.67% to 99.99%) and constraints (by 98.69% to 99.85%) compared to DRCC-linear in all cases, leading to a substantial decrease in computational complexity.
Figure 5: Illustrative results for three 2D cases. a, Evolution of training errors and corresponding geometric shapes for polygonal, elliptical, and nonconvex regions, evaluated with all varying parameters fixed at their baseline values. As feasibility and optimality errors compete during training, the approximated regions progressively converge toward the original region. b-d, PolyFormer fitting results under varying parameters for the polygon (b), ellipse (c), and nonconvex (d) cases. Changes in the parameters induce deformations of the original regions, which are consistently tracked by the approximating polygons.
...and 1 more figures

Theorems & Definitions (3)

Remark 1: Initialization
Remark 2: Normalization
Remark 3: Setting $\lambda$

PolyFormer: learning efficient reformulations for scalable optimization under complex physical constraints

TL;DR

Abstract

PolyFormer: learning efficient reformulations for scalable optimization under complex physical constraints

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (3)