PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

Bingheng Li; Linxin Yang; Yupeng Chen; Senmiao Wang; Qian Chen; Haitao Mao; Yao Ma; Akang Wang; Tian Ding; Jiliang Tang; Ruoyu Sun

PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun

TL;DR

This work tackles accelerating large-scale linear programming by unrolling the Primal-Dual Hybrid Gradient (PDHG) method into a neural network, PDHG-Net, and pairing it with a two-stage L2O framework that warm-starts a PDLP solver for refinement. PDHG-Net uses channel expansion and ReLU-based proximal substitutes to replicate PDHG updates while remaining size-generalizable, with theoretical results showing exact alignment to the PDHG iterations and an ε-approximation guarantee using O(1/ε) neurons. Empirically, the approach achieves up to a 3x speedup on large LPs (e.g., PageRank-scale problems) and provides robust improvements on challenging LP relaxations (IP/WA), aided by a warm-start PDLP refinement that accelerates convergence. The findings highlight the potential of combining unrolled optimization nets with classical solvers to scale LP solving and suggest avenues for applying similar L2O strategies to broader optimization problems such as MIP.

Abstract

Solving large-scale linear programming (LP) problems is an important task in various areas such as communication networks, power systems, finance and logistics. Recently, two distinct approaches have emerged to expedite LP solving: (i) First-order methods (FOMs); (ii) Learning to optimize (L2O). In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L2O method to solve large-scale LP problems. The new architecture PDHG-Net is designed by unrolling the recently emerged PDHG method into a neural network, combined with channel-expansion techniques borrowed from graph neural networks. We prove that the proposed PDHG-Net can recover PDHG algorithm, thus can approximate optimal solutions of LP instances with a polynomial number of neurons. We propose a two-stage inference approach: first use PDHG-Net to generate an approximate solution, and then apply PDHG algorithm to further improve the solution. Experiments show that our approach can significantly accelerate LP solving, achieving up to a 3$\times$ speedup compared to FOMs for large-scale LP problems.

PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

TL;DR

Abstract

speedup compared to FOMs for large-scale LP problems.

Paper Structure (27 sections, 4 theorems, 39 equations, 4 figures, 10 tables)

This paper contains 27 sections, 4 theorems, 39 equations, 4 figures, 10 tables.

Introduction
PDHG-Net for large-scale LP
Standard PDHG Algorithm
Design of PDHG-Net
Proximal operator -- ReLU activation.
Channel expansion.
Network Training
Two-Stage Framework
Implementation Settings
Numerical Experiments
Comparing against vanilla PDLP on large-scale LP problems
Comparing against PDLP on difficult linear relaxations
Understandings of why PDHG-Net works
Generalization on different sizes
Scalability of PDHG-Net
...and 12 more sections

Key Result

Proposition 2.1

Let ${(x^k, y^k)}_{k \geq 0}$ be the primal-dual variables generated by the PDHG algorithm for the LP problem $\mathcal{M} = (G; l,u,c; h)$. If the step sizes $\tau, \sigma$ satisfy $\tau \sigma \|G\|_2^2 < 1$, then for any $(x,y) \in \mathbb{R}^n \times \mathbb{R}_{\geq 0}^m$ satisfying $l \leq x where $\bar{x}^k = (\sum_{j=1}^k x^j) / k$, $\bar{y}^k = (\sum_{j=1}^k y^j) / k$, and $L$ is the La

Figures (4)

Figure 1: Overview of how each layer in PDHG-Net corresponds to each iteration of the traditional PDHG algorithm, along with the overall architecture of PDHG-Net
Figure 2: The proposed post-processing procedure warm-starts the PDLP solver using the prediction of PDHG-Net as initial solutions to ensure optimality.
Figure 3: The distance between the predicted solution of PDHG-Net and optimal solution in PageRank training and validation instances with (a) $5\times10^3$, (b) $1\times10^4$, (c) $2\times10^4$, (d) $4\times10^4$ variable sizes.
Figure 4: We present the improvement ratio in both solving time and the number of iterations for solutions extrapolated at varying distances from the optimal solution. Each blue dot symbolizes an extrapolated solution, while the yellow line represents the trend line fitted through these points. Results demonstrate a strong correlation.

Theorems & Definitions (6)

Proposition 2.1: chambolle2016ergodic
Theorem 2.2
Theorem 2.3
Corollary 2.4
proof
proof

PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

TL;DR

Abstract

PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)