Table of Contents
Fetching ...

Scaling Combinatorial Optimization Neural Improvement Heuristics with Online Search and Adaptation

Federico Julian Camerota Verdù, Lorenzo Castelli, Luca Bortolussi

TL;DR

Limited Rollout Beam Search is introduced, a beam search strategy for deep reinforcement learning (DRL) based combinatorial optimization improvement heuristics that achieves optimality gaps that outperform existing improvement heuristics and narrowing the gap with state-of-the-art constructive methods.

Abstract

We introduce Limited Rollout Beam Search (LRBS), a beam search strategy for deep reinforcement learning (DRL) based combinatorial optimization improvement heuristics. Utilizing pre-trained models on the Euclidean Traveling Salesperson Problem, LRBS significantly enhances both in-distribution performance and generalization to larger problem instances, achieving optimality gaps that outperform existing improvement heuristics and narrowing the gap with state-of-the-art constructive methods. We also extend our analysis to two pickup and delivery TSP variants to validate our results. Finally, we employ our search strategy for offline and online adaptation of the pre-trained improvement policy, leading to improved search performance and surpassing recent adaptive methods for constructive heuristics.

Scaling Combinatorial Optimization Neural Improvement Heuristics with Online Search and Adaptation

TL;DR

Limited Rollout Beam Search is introduced, a beam search strategy for deep reinforcement learning (DRL) based combinatorial optimization improvement heuristics that achieves optimality gaps that outperform existing improvement heuristics and narrowing the gap with state-of-the-art constructive methods.

Abstract

We introduce Limited Rollout Beam Search (LRBS), a beam search strategy for deep reinforcement learning (DRL) based combinatorial optimization improvement heuristics. Utilizing pre-trained models on the Euclidean Traveling Salesperson Problem, LRBS significantly enhances both in-distribution performance and generalization to larger problem instances, achieving optimality gaps that outperform existing improvement heuristics and narrowing the gap with state-of-the-art constructive methods. We also extend our analysis to two pickup and delivery TSP variants to validate our results. Finally, we employ our search strategy for offline and online adaptation of the pre-trained improvement policy, leading to improved search performance and surpassing recent adaptive methods for constructive heuristics.

Paper Structure

This paper contains 26 sections, 2 equations, 2 figures, 9 tables, 1 algorithm.

Figures (2)

  • Figure 1: Example of $2$-opt move with indices $(i=3,\ j=6)$, assuming zero-based numbering, where the position of all the nodes between the two indices is reversed.
  • Figure 2: Comparison of BS, SGBS, and LRBS across three algorithmic phases. On the left, we illustrate the "Expansion" step, which shares similarities among the three algorithms. Highlighted in blue are the $\beta$ paths of the active nodes within the beam and among their children the $\alpha$ yellow nodes represent the best ones selected for expansion. Starting from the selected children, SGBS and LRBS apply the DRL policy in the "Rollout". Finally at the "Selection" step the beam is updated and grown down the search tree. Illustration inspired by choo2022simulation.