Learning to optimize: A tutorial for continuous and mixed-integer optimization

Xiaohan Chen; Jialin Liu; Wotao Yin

Learning to optimize: A tutorial for continuous and mixed-integer optimization

Xiaohan Chen, Jialin Liu, Wotao Yin

TL;DR

This tutorial surveys Learning to Optimize (L2O), a framework that combines traditional optimization with data-driven learning to exploit recurring problem structures. It systematically covers three paradigms—accelerating existing optimization, generating direct solutions, and adapting optimization problems themselves (e.g., Plug-and-Play, differentiable layers). Key methods include algorithm unrolling (e.g., LISTA), DNN-assisted Plug-and-Play, end-to-end differentiable optimization, and ML-guided mixed-integer optimization (e.g., learning to branch, learn to search, and learning to configure). The text discusses practical workflows, mathematical foundations (KKT/differentiation through optimization, fixed-point formulations), and extensive applications in imaging, communications, and MILP, along with numerical results illustrating acceleration and improved decision-making. It also outlines training paradigms, data requirements, generalization considerations, and open theoretical questions, highlighting both potential and limitations of ML-based optimization in real-world solvers.

Abstract

Learning to Optimize (L2O) stands at the intersection of traditional optimization and machine learning, utilizing the capabilities of machine learning to enhance conventional optimization techniques. As real-world optimization problems frequently share common structures, L2O provides a tool to exploit these structures for better or faster solutions. This tutorial dives deep into L2O techniques, introducing how to accelerate optimization algorithms, promptly estimate the solutions, or even reshape the optimization problem itself, making it more adaptive to real-world applications. By considering the prerequisites for successful applications of L2O and the structure of the optimization problems at hand, this tutorial provides a comprehensive guide for practitioners and researchers alike.

Learning to optimize: A tutorial for continuous and mixed-integer optimization

TL;DR

Abstract

Paper Structure (152 sections, 118 equations, 11 figures, 3 tables, 3 algorithms)

This paper contains 152 sections, 118 equations, 11 figures, 3 tables, 3 algorithms.

Introduction
What is learning to optimize (L2O)?
Why bringing learning to optimization?
When to consider L2O?
Train offline and then deploy.
Organization.
Remarks on notation.
Introduction and Deep Neural Networks
Preliminaries of Machine Learning
Deep Neural Networks
Training a Neural Network
Datasets.
Loss functions.
Backpropagation.
Training algorithms.
...and 137 more sections

Figures (11)

Figure 1: Impressive acceleration can be achieved by algorithm unrolling methods (LISTA, LISTA-CPSS, TiLISTA and ALISTA) compared to the classic iterative algorithm ISTA and its variant FISTA accelerated with momentum. Algorithm unrolling uses orders of magnitude fewer iterations than ISTA/FISTA to achieve the same precision. Left figure: noiseless case. Right figure: noisy case (SNR = 20). X-axis is the number of iterations; Y-axis is the normalized mean squared error (lower is better). Plot source: Figure 1 of alista.
Figure 2: An illustration of a generic unrolling process that turns an iterative algorithm into truncated neural network. The observation ${\bm{d}}$ can be seen as the constant input to all iterations. The parameter ${\bm{\theta}}^{(i)}$ can be either temporally varying or constant, corresponding to a recurrent or feedforward system, respectively.
Figure 3: Comparison of LISTA an FISTA for solving sparse coding problems on the testing set. It takes 18 iterations of FISTA to reach the error for LISTA with just one iteration for $n=100$, and 35 iteration for $n=400$. Plot source: Figure 3 of lista.
Figure 4: Variable selection can influence the size of the BnB tree. Consider the MILP: $\min 11x_1 + 12x_2 +13x_3 + 14x_4$ subject to $x_1+x_2+x_3+x_4\geq2.5$, $x_2+2x_3 \geq 2.1$ and all the variables must be binary. The root node represents an optimal solution $\underline{{\bm{x}}} = (0.95,1,0.55,0)$ to the LP relaxation and its corresponding objective ${\bm{c}}^\top\underline{{\bm{x}}} = 29.6$. In $\underline{{\bm{x}}}$, two elements are fractional: $x_1$ and $x_3$. We have to decide which one to branch on. If we adopt the first approach—branching based on the fractional element with the smallest index—we will choose $x_1$ to branch on, leading to the tree shown in the left figure. On the other hand, if we employ a different strategy—branching based on the element with the largest fractionality—the choice is determined by $\mathop{\mathrm{arg\,max}}\limits_{j} \min(\underline{x}_j - \lfloor \underline{x}_j \rfloor, \lceil \underline{x}_j \rceil - \underline{x}_j)$. In this case, $x_3$ is selected, resulting in the tree in the right figure. Continuing this process in either case will yield different BnB trees. Note that, in this case, the node selection strategy will not influence the BnB tree, and you can verify this with any node selection method.
Figure 6: An MILP instance represented by a bipartite graph.
...and 6 more figures

Learning to optimize: A tutorial for continuous and mixed-integer optimization

TL;DR

Abstract

Learning to optimize: A tutorial for continuous and mixed-integer optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (11)