DOGE-Train: Discrete Optimization on GPU with End-to-end Training

Ahmed Abbas; Paul Swoboda

DOGE-Train: Discrete Optimization on GPU with End-to-end Training

Ahmed Abbas, Paul Swoboda

TL;DR

This work addresses solving LP relaxations of 0-1 ILPs by making the FastDOG dual-optimization solver differentiable and training a graph neural network to predict its update parameters. The authors introduce non-parametric updates to escape suboptimal fixed points and preserve dual feasibility, trained with an unsupervised dual-objective loss. Their approach yields a problem-agnostic, GPU-accelerated solver (DOGE) that generalizes from small to large instances and achieves faster convergence and tighter dual bounds than non-learned baselines, sometimes surpassing specialized solvers. The results demonstrate strong performance across varied structured prediction and combinatorial tasks, with competitive anytime performance and practical potential for large-scale ILP relaxations.

Abstract

We present a fast, scalable, data-driven approach for solving relaxations of 0-1 integer linear programs. We use a combination of graph neural networks (GNN) and the Lagrange decomposition based algorithm FastDOG (Abbas and Swoboda 2022b). We make the latter differentiable for end-to-end training and use GNNs to predict its algorithmic parameters. This allows to retain the algorithm's theoretical properties including dual feasibility and guaranteed non-decrease in the lower bound while improving it via training. We overcome suboptimal fixed points of the basic solver by additional non-parametric GNN update steps maintaining dual feasibility. For training we use an unsupervised loss. We train on smaller problems and test on larger ones showing strong generalization performance with a GNN comprising only around $10k$ parameters. Our solver achieves significantly faster performance and better dual objectives than its non-learned version, achieving close to optimal objective values of LP relaxations of very large structured prediction problems and on selected combinatorial ones. In particular, we achieve better objective values than specialized approximate solvers for specific problem classes while retaining their efficiency. Our solver has better any-time performance over a large time period compared to a commercial solver. Code available at https://github.com/LPMP/BDD

DOGE-Train: Discrete Optimization on GPU with End-to-end Training

TL;DR

Abstract

parameters. Our solver achieves significantly faster performance and better dual objectives than its non-learned version, achieving close to optimal objective values of LP relaxations of very large structured prediction problems and on selected combinatorial ones. In particular, we achieve better objective values than specialized approximate solvers for specific problem classes while retaining their efficiency. Our solver has better any-time performance over a large time period compared to a commercial solver. Code available at https://github.com/LPMP/BDD

Paper Structure (40 sections, 2 theorems, 22 equations, 4 figures, 8 tables, 3 algorithms)

This paper contains 40 sections, 2 theorems, 22 equations, 4 figures, 8 tables, 3 algorithms.

Introduction
Contributions
Related Work
Learning to solve combinatorial optimization
Unrolling algorithms for parameter learning
Method
Lagrange Decomposition
Optimization of Lagrangean dual
Backpropagation through dual optimization
Efficient Implementation
Non-Parametric Update Steps
Graph Neural Network
Graph convolution
Loss
Overall pipeline
...and 25 more sections

Key Result

Proposition 1

For any $\alpha_{ij} \geq 0$ with $\sum_{j \in \mathcal{J}_i} \alpha_{ij} = 1$ and $\omega_{ij} \in [0,1]$ the min-marginal averaging step in line alg:lambda-update in Algorithm alg:parallel-mma retains dual feasibility and is non-decreasing in the dual lower bound.

Figures (4)

Figure 1: Our method for optimizing the Lagrangean dual \ref{['eq:dual-problem']}. The dual problem is encoded on a bipartite graph containing features $f_\mathcal{I}$, $f_\mathcal{J}$ and $f_\mathcal{E}$ for primal variables, subproblems and dual variables resp. A graph neural network (GNN) predicts $\theta, \alpha$, $\omega$ for dual updates. In one dual update block (right), current set of Lagrange multipliers $\lambda$ are first updated by the non-parametric update using $\theta$. Afterwards parametric update is done via Alg. \ref{['alg:parallel-mma']} using $\alpha$, $\omega$. The updated solver features $f$ and LSTM cell states $s_\mathcal{I}$ are sent to the GNN in next optimization round. See Sec. \ref{['sec:pipeline']} for further details.
Figure 2: Convergence plots for $g(t)$ the relative dual gap to the optimum (or maximum suboptimal objective among all methods) of the relaxation \ref{['eq:dual-problem']}. X-axis indicates wall clock time and both axes are logarithmic. The value of $g(t)$ is averaged over all test instances in each dataset.
Figure 3: Computational graph of BlockUpdate in Alg. \ref{['alg:parallel-mma']}
Figure 4: Convergence plots of smaller test instances of QAPLib ($\leq$ 40 nodes).

Theorems & Definitions (7)

Definition 1: Binary Program lange2021efficient
Definition 2: Lagrangean dual problem lange2021efficient
Remark
Proposition 1: Dual Feasibility and Monotonicity of Generalized Min-marginal Averaging
Proposition 2
proof
proof

DOGE-Train: Discrete Optimization on GPU with End-to-end Training

TL;DR

Abstract

DOGE-Train: Discrete Optimization on GPU with End-to-end Training

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (7)