Table of Contents
Fetching ...

Memory-Enhanced Neural Solvers for Routing Problems

Felix Chalumeau, Refiloe Shabe, Noah De Nicola, Arnu Pretorius, Thomas D. Barrett, Nathan Grinsztajn

TL;DR

This work tackles the challenge of solving NP-hard routing problems efficiently by enhancing neural solvers with memory-based online adaptation. The proposed MEMENTO framework dynamically updates action logits during search using data gathered across budgeted attempts, and is architecture-agnostic, enabling seamless integration with existing solvers. Empirical results on TSP and CVRP show MEMENTO outperforming policy-gradient fine-tuning and tree-search baselines, with strong data efficiency and robustness in in- and out-of-distribution settings; it also achieves zero-shot state-of-the-art when combined with unseen solvers like Compass and scales to instances of size 500. The approach provides practical benefits for industrial routing applications by improving solution quality within limited compute budgets and by offering interpretable, learnable update rules for adaptation at inference time.

Abstract

Routing Problems are central to many real-world applications, yet remain challenging due to their (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off between quality and scalability, making them suitable for industrial use. While Reinforcement Learning (RL) offers a flexible framework for designing heuristics, its adoption over handcrafted heuristics remains incomplete. Existing learned methods still lack the ability to adapt to specific instances and fully leverage the available computational budget. Current best methods either rely on a collection of pre-trained policies, or on RL fine-tuning; hence failing to fully utilize newly available information within the constraints of the budget. In response, we present MEMENTO, an approach that leverages memory to improve the search of neural solvers at inference. MEMENTO leverages online data collected across repeated attempts to dynamically adjust the action distribution based on the outcome of previous decisions. We validate its effectiveness on the Traveling Salesman and Capacitated Vehicle Routing problems, demonstrating its superiority over tree-search and policy-gradient fine-tuning; and showing that it can be zero-shot combined with diversity-based solvers. We successfully train all RL auto-regressive solvers on large instances, and verify MEMENTO's scalability and data-efficiency: pushing the state-of-the-art on 11 out of 12 evaluated tasks.

Memory-Enhanced Neural Solvers for Routing Problems

TL;DR

This work tackles the challenge of solving NP-hard routing problems efficiently by enhancing neural solvers with memory-based online adaptation. The proposed MEMENTO framework dynamically updates action logits during search using data gathered across budgeted attempts, and is architecture-agnostic, enabling seamless integration with existing solvers. Empirical results on TSP and CVRP show MEMENTO outperforming policy-gradient fine-tuning and tree-search baselines, with strong data efficiency and robustness in in- and out-of-distribution settings; it also achieves zero-shot state-of-the-art when combined with unseen solvers like Compass and scales to instances of size 500. The approach provides practical benefits for industrial routing applications by improving solution quality within limited compute budgets and by offering interpretable, learnable update rules for adaptation at inference time.

Abstract

Routing Problems are central to many real-world applications, yet remain challenging due to their (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off between quality and scalability, making them suitable for industrial use. While Reinforcement Learning (RL) offers a flexible framework for designing heuristics, its adoption over handcrafted heuristics remains incomplete. Existing learned methods still lack the ability to adapt to specific instances and fully leverage the available computational budget. Current best methods either rely on a collection of pre-trained policies, or on RL fine-tuning; hence failing to fully utilize newly available information within the constraints of the budget. In response, we present MEMENTO, an approach that leverages memory to improve the search of neural solvers at inference. MEMENTO leverages online data collected across repeated attempts to dynamically adjust the action distribution based on the outcome of previous decisions. We validate its effectiveness on the Traveling Salesman and Capacitated Vehicle Routing problems, demonstrating its superiority over tree-search and policy-gradient fine-tuning; and showing that it can be zero-shot combined with diversity-based solvers. We successfully train all RL auto-regressive solvers on large instances, and verify MEMENTO's scalability and data-efficiency: pushing the state-of-the-art on 11 out of 12 evaluated tasks.
Paper Structure (54 sections, 10 figures, 13 tables, 1 algorithm)

This paper contains 54 sections, 10 figures, 13 tables, 1 algorithm.

Figures (10)

  • Figure 2: MEMENTO uses a memory to adapt neural solvers at inference time. When taking a decision, data from similar states is retrieved and prepared (1,2), then processed by a MLP to derive correction logits for each action (3). Summing the original and new logits enables to update the action distribution. The resulting policy is then rolled out (4), and transitions' data is stored in a memory (5,6), including node visited, action taken, log probability, and return obtained.
  • Figure 3: Building a new action distribution using the memory. Relevant data is retrieved and processed by a MLP to derive logits for each possible action.
  • Figure 4: Akin to REINFORCE (left), MEMENTO (right) encourages actions with high returns, particularly when they have low probability.memento learns an asymmetric rule: requiring the normalised return to be strictly positive to reinforce an action, but encouraging it even more.
  • Figure 5: Combining MEMENTO and COMPASS during search on CVRP200, no re-training needed.
  • Figure 6: MEMENTO outperforms EAS on instances of size 500 across batch sizes and sequential attempts. Green areas indicate settings where memento adapts more efficiently. It consistently outperforms eas on TSP and in most CVRP settings, with strong gains under low budgets.
  • ...and 5 more figures