Scalable Mixed-Integer Optimization with Neural Constraints via Dual Decomposition
Shuli Zeng, Sijia Zhang, Feng Wu, Shaojie Tang, Xiang-Yang Li
TL;DR
Embedding neural networks as constraints in mixed-integer programs traditionally suffers from explosive Big-M linearizations that render large nets intractable. We introduce a dual-decomposition framework that duplicates $x$ as a continuous $u$, and coordinates the MIP and NN blocks via an augmented Lagrangian with parameter $\rho$, solving them alternately and updating duals to align $u$ and $x$. Theoretical results establish convergence to a KKT point and linear scalability of per-iteration cost with NN size. Empirically, the method achieves up to 120x speedups over exact Big-M baselines, works across CNN/LSTM backbones without reformulation, and demonstrates modularity by swapping NN solvers with minimal code changes, confirming practical impact for scalable, verifiable optimization with learned constraints.
Abstract
Embedding deep neural networks (NNs) into mixed-integer programs (MIPs) is attractive for decision making with learned constraints, yet state-of-the-art monolithic linearisations blow up in size and quickly become intractable. In this paper, we introduce a novel dual-decomposition framework that relaxes the single coupling equality u=x with an augmented Lagrange multiplier and splits the problem into a vanilla MIP and a constrained NN block. Each part is tackled by the solver that suits it best-branch and cut for the MIP subproblem, first-order optimisation for the NN subproblem-so the model remains modular, the number of integer variables never grows with network depth, and the per-iteration cost scales only linearly with the NN size. On the public \textsc{SurrogateLIB} benchmark, our method proves \textbf{scalable}, \textbf{modular}, and \textbf{adaptable}: it runs \(120\times\) faster than an exact Big-M formulation on the largest test case; the NN sub-solver can be swapped from a log-barrier interior step to a projected-gradient routine with no code changes and identical objective value; and swapping the MLP for an LSTM backbone still completes the full optimisation in 47s without any bespoke adaptation.
