Differentiating Through Integer Linear Programs with Quadratic Regularization and Davis-Yin Splitting

Daniel McKenzie; Samy Wu Fung; Howard Heaton

Differentiating Through Integer Linear Programs with Quadratic Regularization and Davis-Yin Splitting

Daniel McKenzie, Samy Wu Fung, Howard Heaton

TL;DR

This work addresses end-to-end learning for ILPs with context-dependent costs by relaxing the ILP to a quadratically regularized LP and solving via a Davis–Yin three-operator splitting scheme. It introduces DYS-Net, enabling forward passes that are scalable for large problem sizes and backward passes that use Jacobian-free backpropagation to yield informative gradients without requiring Lagrange multipliers. The authors provide theoretical conditions ensuring descent directions in training and demonstrate that the combined forward/backward approach scales to tens of thousands of variables, outperforming existing baselines on shortest path and knapsack problems, and extending to large-scale shortest-path settings. The practical impact lies in enabling efficient, differentiable optimization layers for complex combinatorial problems within neural networks, with open-source code to facilitate adoption and further research.

Abstract

In many applications, a combinatorial problem must be repeatedly solved with similar, but distinct parameters. Yet, the parameters $w$ are not directly observed; only contextual data $d$ that correlates with $w$ is available. It is tempting to use a neural network to predict $w$ given $d$. However, training such a model requires reconciling the discrete nature of combinatorial optimization with the gradient-based frameworks used to train neural networks. We study the case where the problem in question is an Integer Linear Program (ILP). We propose applying a three-operator splitting technique, also known as Davis-Yin splitting (DYS), to the quadratically regularized continuous relaxation of the ILP. We prove that the resulting scheme is compatible with the recently introduced Jacobian-free backpropagation (JFB). Our experiments on two representative ILPs: the shortest path problem and the knapsack problem, demonstrate that this combination-DYS on the forward pass, JFB on the backward pass-yields a scheme which scales more effectively to high-dimensional problems than existing schemes. All code associated with this paper is available at github.com/mines-opt-ml/fpo-dys.

Differentiating Through Integer Linear Programs with Quadratic Regularization and Davis-Yin Splitting

TL;DR

Abstract

In many applications, a combinatorial problem must be repeatedly solved with similar, but distinct parameters. Yet, the parameters

are not directly observed; only contextual data

that correlates with

is available. It is tempting to use a neural network to predict

given

. However, training such a model requires reconciling the discrete nature of combinatorial optimization with the gradient-based frameworks used to train neural networks. We study the case where the problem in question is an Integer Linear Program (ILP). We propose applying a three-operator splitting technique, also known as Davis-Yin splitting (DYS), to the quadratically regularized continuous relaxation of the ILP. We prove that the resulting scheme is compatible with the recently introduced Jacobian-free backpropagation (JFB). Our experiments on two representative ILPs: the shortest path problem and the knapsack problem, demonstrate that this combination-DYS on the forward pass, JFB on the backward pass-yields a scheme which scales more effectively to high-dimensional problems than existing schemes. All code associated with this paper is available at github.com/mines-opt-ml/fpo-dys.

Paper Structure (45 sections, 8 theorems, 70 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 45 sections, 8 theorems, 70 equations, 7 figures, 2 tables, 1 algorithm.

Introduction
Contribution
Preliminaries
LP Reformulation
Losses and Training Data
Argmin Differentiation
Pitfalls of Relaxation
Related Works
Optimization Layers
Decision-focused learning for LPs
Deep Equilibrium Models
Learning-to-Optimize (L2O)
Computing the derivative of a minimizer with respect to parameters
DYS-Net
The Forward Pass
...and 30 more sections

Key Result

Lemma 1

If $\mathcal{C}_1, \mathcal{C}_2$ are as in equation eq:CanonicalPolytope and $A$ is full-rank then:

Figures (7)

Figure 1: The shortest path prediction problem poganvcic2019differentiation. The goal is to find the shortest path (from top-left to bottom-right) through a randomly generated terrain map from the Warcraft II tileset guyomarchwarcraft. The contextual data $d$, shown in (a), is an image sub-divided into 8-by-8 squares, each representing a vertex in a 12-by-12 grid graph. The cost of traversing each square, i.e.$w(d)$, is shown in (b), with darker shading representing lower cost. The true shortest path is shown in (c).
Figure 2: Results for the shortest path and knapsack prediction problems. Figures (a) and (b) show normalized regret and train time for the shortest path prediction problem, while Figures (c) and (d) show normalized regret and train time for the knapsack prediction problem. Note that the train time in figures (b) and (d) is the time till the model achieving best normalized regret on the validation set is reached.
Figure 3: Accuracy (in percentage) of predicted paths on 5-by-5 grid during training.
Figure 4: Results for the shortest path prediction problem. a) Test MSE loss (left), b) training time in minutes (middle), and c) regret values (right) vs. gridsize for DYS-Net (proposed) and approaches using cvxpylayersagrawal2019differentiable labeled CVX; Perturbed Optimization berthet2020learning labeled PertOpt; and Blackbox Backpropagation vlastelica2019differentiation, labeled BB. Note CVX is unable to load or run problems with gridsize over 30. Dimensions of the variables can be found in Table \ref{['tab: num_variables']}.
Figure 5: Left two figures: Sample cost matrices for shortest problem considered in Section \ref{['sec:pyeopo_shortest_path']}. Right two figures: Sample cost matrices for the Warcraft shortest path prediction problem considered in Section \ref{['sec:warcraft_shortest_path']}. Note that in Section \ref{['sec:warcraft_shortest_path']} the node weighted shortest path prediction problem is considered, while in Section \ref{['sec:pyeopo_shortest_path']} the edge weighted variant is solved. For ease of comparison, in the left two figures we have reshaped the edge cost vector into a node cost matrix.
...and 2 more figures

Theorems & Definitions (18)

Lemma 1
Theorem 2
Definition 3
Theorem 4
Definition 5: LICQ condition, specialized to our case
Theorem 6
Corollary 7
proof
Lemma 8
proof
...and 8 more

Differentiating Through Integer Linear Programs with Quadratic Regularization and Davis-Yin Splitting

TL;DR

Abstract

Differentiating Through Integer Linear Programs with Quadratic Regularization and Davis-Yin Splitting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (18)