Table of Contents
Fetching ...

Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon

Yoshua Bengio, Andrea Lodi, Antoine Prouvost

TL;DR

This survey addresses how machine learning can aid combinatorial optimization by learning decisions within CO algorithms or by producing end-to-end solutions. It distinguishes demonstration (imitation) and experience (reinforcement learning) as two learning paradigms and discusses policy learning and algorithm configuration. It introduces a unifying learning objective framework across multi-instance distributions, surrogate rewards, and generalization concerns. It argues that hybrid approaches—combining data-driven components with exact optimization guarantees—are the most practical path forward for CO.

Abstract

This paper surveys the recent attempts, both from the machine learning and operations research communities, at leveraging machine learning to solve combinatorial optimization problems. Given the hard nature of these problems, state-of-the-art algorithms rely on handcrafted heuristics for making decisions that are otherwise too expensive to compute or mathematically not well defined. Thus, machine learning looks like a natural candidate to make such decisions in a more principled and optimized way. We advocate for pushing further the integration of machine learning and combinatorial optimization and detail a methodology to do so. A main point of the paper is seeing generic optimization problems as data points and inquiring what is the relevant distribution of problems to use for learning on a given task.

Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon

TL;DR

This survey addresses how machine learning can aid combinatorial optimization by learning decisions within CO algorithms or by producing end-to-end solutions. It distinguishes demonstration (imitation) and experience (reinforcement learning) as two learning paradigms and discusses policy learning and algorithm configuration. It introduces a unifying learning objective framework across multi-instance distributions, surrogate rewards, and generalization concerns. It argues that hybrid approaches—combining data-driven components with exact optimization guarantees—are the most practical path forward for CO.

Abstract

This paper surveys the recent attempts, both from the machine learning and operations research communities, at leveraging machine learning to solve combinatorial optimization problems. Given the hard nature of these problems, state-of-the-art algorithms rely on handcrafted heuristics for making decisions that are otherwise too expensive to compute or mathematically not well defined. Thus, machine learning looks like a natural candidate to make such decisions in a more principled and optimized way. We advocate for pushing further the integration of machine learning and combinatorial optimization and detail a methodology to do so. A main point of the paper is seeing generic optimization problems as data points and inquiring what is the relevant distribution of problems to use for learning on a given task.

Paper Structure

This paper contains 36 sections, 10 equations, 9 figures.

Figures (9)

  • Figure 1: A branch-and-bound tree for . The relaxation is computed at every node (only partially shown in the figure). Nodes still open for exploration are represented as blank.
  • Figure 2: The Markov decision process associated with reinforcement learning, modified from sutton2018reinforcement. The agent behavior is defined by its policy $\pi$, while the environment evolution is defined by the dynamics $p$. Note that the reward is not necessary to define the evolution and is provided only as a learning mechanism for the agent. Actions, states, and rewards are random variables in the general framework.
  • Figure 3: A vanilla modified from goodfellow2016deep. On the left, the black square indicates a one step delay. On the right, the same is shown unfolded. Three sets $U$, $V$, and $W$ of parameters are represented and re-used at every time step.
  • Figure 4: A vanilla attention mechanism where a query $q$ is computed against a set of values $(v_i)_i$. An affinity function $f$, such as a dot product, is used on query and value pairs. If it includes some parameters, the mechanism can be learned.
  • Figure 5: In the demonstration setting, the policy is trained to reproduce the action of an expert policy by minimizing some discrepancy in the action space.
  • ...and 4 more figures