Sample Complexity of Algorithm Selection Using Neural Networks and Its Applications to Branch-and-Cut

Hongyu Cheng; Sammy Khalife; Barbara Fiedorowicz; Amitabh Basu

Sample Complexity of Algorithm Selection Using Neural Networks and Its Applications to Branch-and-Cut

Hongyu Cheng, Sammy Khalife, Barbara Fiedorowicz, Amitabh Basu

TL;DR

This paper advances data-driven algorithm design by showing how neural networks can learn instance-dependent mappings from problem instances to cutting-plane decisions in branch-and-cut. By framing the mapping as a parameterized NN problem and bounding the pseudo-dimension of the resulting hypothesis class, it provides rigorous sample complexity guarantees for neural-network-based algorithm selection. The authors apply this framework to Chvátal-Gomory and Gomory mixed-integer cuts at the root node of MILP branch-and-cut trees, deriving both single-cut and multi-cut learnability bounds under Sigmoid and ReLU activations. Empirically, they demonstrate that neural-network-guided cut selection yields significant reductions in tree size and faster cut selection compared with traditional, tableaus-based methods, using reinforcement learning to train the model on knapsack instances.

Abstract

Data-driven algorithm design is a paradigm that uses statistical and machine learning techniques to select from a class of algorithms for a computational problem an algorithm that has the best expected performance with respect to some (unknown) distribution on the instances of the problem. We build upon recent work in this line of research by considering the setup where, instead of selecting a single algorithm that has the best performance, we allow the possibility of selecting an algorithm based on the instance to be solved, using neural networks. In particular, given a representative sample of instances, we learn a neural network that maps an instance of the problem to the most appropriate algorithm for that instance. We formalize this idea and derive rigorous sample complexity bounds for this learning problem, in the spirit of recent work in data-driven algorithm design. We then apply this approach to the problem of making good decisions in the branch-and-cut framework for mixed-integer optimization (e.g., which cut to add?). In other words, the neural network will take as input a mixed-integer optimization instance and output a decision that will result in a small branch-and-cut tree for that instance. Our computational results provide evidence that our particular way of using neural networks for cut selection can make a significant impact in reducing branch-and-cut tree sizes, compared to previous data-driven approaches.

Sample Complexity of Algorithm Selection Using Neural Networks and Its Applications to Branch-and-Cut

TL;DR

Abstract

Paper Structure (24 sections, 15 theorems, 44 equations, 1 figure, 1 table)

This paper contains 24 sections, 15 theorems, 44 equations, 1 figure, 1 table.

Background and motivation
Applications in branch-and-cut methods for mixed-integer linear optimization
Our approach and results.
Comparison with prior work on cut selection using learning techniques.
Formal statement of results
Preliminaries
Background from learning theory
Neural networks
Our results
Application to branch-and-cut
Preliminaries
Learnability of parameterized CG cut(s)
Learnability of cutting plane(s) from a finite set
Learnability of cut selection policy
Numerical experiments
...and 9 more sections

Key Result

Theorem 2.3

There exists a universal constant $C$ such that the following holds. Let $\mathcal{H}$ be a hypothesis class defined by some $h : \mathcal{I} \times\mathcal{P} \to \mathbb{R}$ such that the range of $h$ is in $[0,B]$ for some $B>0$. For any distribution $\mathcal{D}$ on $\mathcal{X}$, $\epsilon > 0$ we have with probability $1-\delta$ over i.i.d. samples $I_1, \ldots, I_t \in \mathcal{I}$ of size

Figures (1)

Figure 1: Comparison of branch-and-bound tree sizes using different cut selection strategies.

Theorems & Definitions (26)

Definition 2.1: Parameterized function classes
Definition 2.2: Pseudo-dimension
Theorem 2.3
Definition 2.4: Neural networks
Theorem 2.5
Theorem 2.6
Corollary 2.7
Definition 3.1: Integer linear programming (ILP)
Lemma 3.2: Lemma 3.2 in balcan2021sample
Proposition 3.3
...and 16 more

Sample Complexity of Algorithm Selection Using Neural Networks and Its Applications to Branch-and-Cut

TL;DR

Abstract

Sample Complexity of Algorithm Selection Using Neural Networks and Its Applications to Branch-and-Cut

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (26)