Memory-Enhanced Neural Solvers for Routing Problems
Felix Chalumeau, Refiloe Shabe, Noah De Nicola, Arnu Pretorius, Thomas D. Barrett, Nathan Grinsztajn
TL;DR
This work tackles the challenge of solving NP-hard routing problems efficiently by enhancing neural solvers with memory-based online adaptation. The proposed MEMENTO framework dynamically updates action logits during search using data gathered across budgeted attempts, and is architecture-agnostic, enabling seamless integration with existing solvers. Empirical results on TSP and CVRP show MEMENTO outperforming policy-gradient fine-tuning and tree-search baselines, with strong data efficiency and robustness in in- and out-of-distribution settings; it also achieves zero-shot state-of-the-art when combined with unseen solvers like Compass and scales to instances of size 500. The approach provides practical benefits for industrial routing applications by improving solution quality within limited compute budgets and by offering interpretable, learnable update rules for adaptation at inference time.
Abstract
Routing Problems are central to many real-world applications, yet remain challenging due to their (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off between quality and scalability, making them suitable for industrial use. While Reinforcement Learning (RL) offers a flexible framework for designing heuristics, its adoption over handcrafted heuristics remains incomplete. Existing learned methods still lack the ability to adapt to specific instances and fully leverage the available computational budget. Current best methods either rely on a collection of pre-trained policies, or on RL fine-tuning; hence failing to fully utilize newly available information within the constraints of the budget. In response, we present MEMENTO, an approach that leverages memory to improve the search of neural solvers at inference. MEMENTO leverages online data collected across repeated attempts to dynamically adjust the action distribution based on the outcome of previous decisions. We validate its effectiveness on the Traveling Salesman and Capacitated Vehicle Routing problems, demonstrating its superiority over tree-search and policy-gradient fine-tuning; and showing that it can be zero-shot combined with diversity-based solvers. We successfully train all RL auto-regressive solvers on large instances, and verify MEMENTO's scalability and data-efficiency: pushing the state-of-the-art on 11 out of 12 evaluated tasks.
