Table of Contents
Fetching ...

Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization

Paul Strang, Zacharie Alès, Côme Bissuel, Olivier Juan, Safia Kedad-Sidhoum, Emmanuel Rachelson

TL;DR

The paper presents PlanB&B, a model-based reinforcement learning approach that leverages a MuZero-inspired internal model of Branch-and-Bound dynamics to learn enhanced branching decisions for MILP problems. It uses a graph-based MILP representation, a learned dynamics model to imagine subtree trajectories, and a planning loop (via Gumbel search) to produce improved branching targets without solving LPs during evaluation. Training relies on K-step subtree trajectories with a tree-consistency loss, enabling policy and value heads to improve through internal planning. Empirically, PlanB&B outperforms prior RL and IL baselines across four MILP benchmarks, with planning over the learned model offering additional gains, while highlighting DFS- based node selection as a limit for scaling to higher-dimensional problems.

Abstract

Mixed-Integer Linear Programming (MILP) lies at the core of many real-world combinatorial optimization (CO) problems, traditionally solved by branch-and-bound (B&B). A key driver influencing B&B solvers efficiency is the variable selection heuristic that guides branching decisions. Looking to move beyond static, hand-crafted heuristics, recent work has explored adapting traditional reinforcement learning (RL) algorithms to the B&B setting, aiming to learn branching strategies tailored to specific MILP distributions. In parallel, RL agents have achieved remarkable success in board games, a very specific type of combinatorial problems, by leveraging environment simulators to plan via Monte Carlo Tree Search (MCTS). Building on these developments, we introduce Plan-and-Branch-and-Bound (PlanB&B), a model-based reinforcement learning (MBRL) agent that leverages a learned internal model of the B&B dynamics to discover improved branching strategies. Computational experiments empirically validate our approach, with our MBRL branching agent outperforming previous state-of-the-art RL methods across four standard MILP benchmarks.

Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization

TL;DR

The paper presents PlanB&B, a model-based reinforcement learning approach that leverages a MuZero-inspired internal model of Branch-and-Bound dynamics to learn enhanced branching decisions for MILP problems. It uses a graph-based MILP representation, a learned dynamics model to imagine subtree trajectories, and a planning loop (via Gumbel search) to produce improved branching targets without solving LPs during evaluation. Training relies on K-step subtree trajectories with a tree-consistency loss, enabling policy and value heads to improve through internal planning. Empirically, PlanB&B outperforms prior RL and IL baselines across four MILP benchmarks, with planning over the learned model offering additional gains, while highlighting DFS- based node selection as a limit for scaling to higher-dimensional problems.

Abstract

Mixed-Integer Linear Programming (MILP) lies at the core of many real-world combinatorial optimization (CO) problems, traditionally solved by branch-and-bound (B&B). A key driver influencing B&B solvers efficiency is the variable selection heuristic that guides branching decisions. Looking to move beyond static, hand-crafted heuristics, recent work has explored adapting traditional reinforcement learning (RL) algorithms to the B&B setting, aiming to learn branching strategies tailored to specific MILP distributions. In parallel, RL agents have achieved remarkable success in board games, a very specific type of combinatorial problems, by leveraging environment simulators to plan via Monte Carlo Tree Search (MCTS). Building on these developments, we introduce Plan-and-Branch-and-Bound (PlanB&B), a model-based reinforcement learning (MBRL) agent that leverages a learned internal model of the B&B dynamics to discover improved branching strategies. Computational experiments empirically validate our approach, with our MBRL branching agent outperforming previous state-of-the-art RL methods across four standard MILP benchmarks.

Paper Structure

This paper contains 38 sections, 15 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Aggregate normalized solving time performance obtained on test instances by SCIP bestuzheva_scip_2021, IL, RL and random baselines across the Ecole benchmark prouvost_ecole_2020, in log scale. These baselines are formally introduced in Section \ref{['sec:exp_setup']}.
  • Figure 2: Solving a MILP by B&B using variable selection policy $\pi$ and node selection policy $\rho$. Each node $v_i$ represents a MILP derived from the original problem, each edge represents the bound adjustment applied to derive child nodes from their parent. At each step, nodes $o_i \in \mathcal{O}$ are re-indexed according to $\rho$.
  • Figure 3: Planning in B&B over a learned model. The combined use of $h$, $f$, and $g$ allows simulating subtree rollouts starting from the current B&B node. Here, $\hat{\mathrm{T}}^3 =(\hat{\mathcal{O}}^3, \hat{\mathcal{C}}^3)$ with $\hat{\mathcal{O}}^3 = \{ \hat{o}^3_l, \hat{o}^3_r, \hat{o}^1_r\}$ and $\hat{\mathcal{C}}^3=\{ \hat{o}^2_r\}$. To simplify notations, we write $\mathrm{z}^i_j$ in place of $\mathrm{z}_{\hat{o}^i_j}$ for $\mathrm{z} \in \{\mathrm{p}, \bar{\mathrm{v}}, \mathrm{b}\}$.
  • Figure 4: Policy improvement associated with increased simulation budget over the MIS benchmark.
  • Figure 5: PlanB&B tree-consistency loss modeled after the SimSiam architecture from chen_exploring_2020.
  • ...and 2 more figures