Table of Contents
Fetching ...

Scalable Mixed-Integer Optimization with Neural Constraints via Dual Decomposition

Shuli Zeng, Sijia Zhang, Feng Wu, Shaojie Tang, Xiang-Yang Li

TL;DR

Embedding neural networks as constraints in mixed-integer programs traditionally suffers from explosive Big-M linearizations that render large nets intractable. We introduce a dual-decomposition framework that duplicates $x$ as a continuous $u$, and coordinates the MIP and NN blocks via an augmented Lagrangian with parameter $\rho$, solving them alternately and updating duals to align $u$ and $x$. Theoretical results establish convergence to a KKT point and linear scalability of per-iteration cost with NN size. Empirically, the method achieves up to 120x speedups over exact Big-M baselines, works across CNN/LSTM backbones without reformulation, and demonstrates modularity by swapping NN solvers with minimal code changes, confirming practical impact for scalable, verifiable optimization with learned constraints.

Abstract

Embedding deep neural networks (NNs) into mixed-integer programs (MIPs) is attractive for decision making with learned constraints, yet state-of-the-art monolithic linearisations blow up in size and quickly become intractable. In this paper, we introduce a novel dual-decomposition framework that relaxes the single coupling equality u=x with an augmented Lagrange multiplier and splits the problem into a vanilla MIP and a constrained NN block. Each part is tackled by the solver that suits it best-branch and cut for the MIP subproblem, first-order optimisation for the NN subproblem-so the model remains modular, the number of integer variables never grows with network depth, and the per-iteration cost scales only linearly with the NN size. On the public \textsc{SurrogateLIB} benchmark, our method proves \textbf{scalable}, \textbf{modular}, and \textbf{adaptable}: it runs \(120\times\) faster than an exact Big-M formulation on the largest test case; the NN sub-solver can be swapped from a log-barrier interior step to a projected-gradient routine with no code changes and identical objective value; and swapping the MLP for an LSTM backbone still completes the full optimisation in 47s without any bespoke adaptation.

Scalable Mixed-Integer Optimization with Neural Constraints via Dual Decomposition

TL;DR

Embedding neural networks as constraints in mixed-integer programs traditionally suffers from explosive Big-M linearizations that render large nets intractable. We introduce a dual-decomposition framework that duplicates as a continuous , and coordinates the MIP and NN blocks via an augmented Lagrangian with parameter , solving them alternately and updating duals to align and . Theoretical results establish convergence to a KKT point and linear scalability of per-iteration cost with NN size. Empirically, the method achieves up to 120x speedups over exact Big-M baselines, works across CNN/LSTM backbones without reformulation, and demonstrates modularity by swapping NN solvers with minimal code changes, confirming practical impact for scalable, verifiable optimization with learned constraints.

Abstract

Embedding deep neural networks (NNs) into mixed-integer programs (MIPs) is attractive for decision making with learned constraints, yet state-of-the-art monolithic linearisations blow up in size and quickly become intractable. In this paper, we introduce a novel dual-decomposition framework that relaxes the single coupling equality u=x with an augmented Lagrange multiplier and splits the problem into a vanilla MIP and a constrained NN block. Each part is tackled by the solver that suits it best-branch and cut for the MIP subproblem, first-order optimisation for the NN subproblem-so the model remains modular, the number of integer variables never grows with network depth, and the per-iteration cost scales only linearly with the NN size. On the public \textsc{SurrogateLIB} benchmark, our method proves \textbf{scalable}, \textbf{modular}, and \textbf{adaptable}: it runs faster than an exact Big-M formulation on the largest test case; the NN sub-solver can be swapped from a log-barrier interior step to a projected-gradient routine with no code changes and identical objective value; and swapping the MLP for an LSTM backbone still completes the full optimisation in 47s without any bespoke adaptation.

Paper Structure

This paper contains 56 sections, 4 theorems, 24 equations, 7 figures, 2 algorithms.

Key Result

Theorem 5.1

Under Assumption ass:regularity, the sequence $\{(x^{(k)},u^{(k)},\lambda^{(k)})\}$ generated by Algorithm alg:dual_decomp is bounded. Every accumulation point $(x^{\star},u^{\star},\lambda^{\star})$ satisfies: Moreover,

Figures (7)

  • Figure 1: Workflow of the NN‑Embedded MIP: historical samples are adjusted under budget constraints, evaluated by a neural classifier, and optimised via a mixed‑integer solver.
  • Figure 2: Illustration of the proposed NN-embedded MIP solution framework using augmented Lagrangian decomposition, emphasizing iterative primal-dual coordination.
  • Figure 3: (E1) Comparison with Linearisation-based Methods. Solution quality (left) and computation time (right) across network architectures and problem sizes. Colours denote potability rate and $\log_{10}$ wall-clock time, respectively.
  • Figure 4: (E2) Scalability stress test with fixed $n=100$. Left: convergence rate; centre: total computation time (log scale); right: average time per iteration (log scale).
  • Figure 5: (E3) Ablation results. SSG (no dual coordination) versus full DD. Left: solution quality (normalised to SCIP). Centre: log average iterations to feasibility. Right: speed-ups over SCIP (higher is better).
  • ...and 2 more figures

Theorems & Definitions (8)

  • Theorem 5.1: Convergence to a KKT point
  • proof : Proof
  • Theorem 5.2: Linear Scalability in Network Size
  • proof : Proof
  • Lemma 1: Strong convexity of the $u$-block
  • proof
  • Lemma 2: Per-iteration decrease
  • proof