MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial Optimization
Andoni I. Garmendia, Quentin Cappart, Josu Ceberio, Alexander Mendiburu
TL;DR
MARCO addresses inefficient exploration in Neural Combinatorial Optimization by introducing a memory-augmented reinforcement framework that stores and retrieves past solutions to guide current decisions. It supports both neural improvement and neural constructive methods, leveraging a shared memory across parallel threads to enable collaborative exploration. Empirical results on MC, MIS, and TSP demonstrate that MARCO improves solution quality and exploration efficiency, often surpassing learning-based baselines while maintaining low computational cost. The work highlights memory as a powerful mechanism for enhancing NCO, with clear directions for memory management and retrieval enhancements in future research.
Abstract
Neural Combinatorial Optimization (NCO) is an emerging domain where deep learning techniques are employed to address combinatorial optimization problems as a standalone solver. Despite their potential, existing NCO methods often suffer from inefficient search space exploration, frequently leading to local optima entrapment or redundant exploration of previously visited states. This paper introduces a versatile framework, referred to as Memory-Augmented Reinforcement for Combinatorial Optimization (MARCO), that can be used to enhance both constructive and improvement methods in NCO through an innovative memory module. MARCO stores data collected throughout the optimization trajectory and retrieves contextually relevant information at each state. This way, the search is guided by two competing criteria: making the best decision in terms of the quality of the solution and avoiding revisiting already explored solutions. This approach promotes a more efficient use of the available optimization budget. Moreover, thanks to the parallel nature of NCO models, several search threads can run simultaneously, all sharing the same memory module, enabling an efficient collaborative exploration. Empirical evaluations, carried out on the maximum cut, maximum independent set and travelling salesman problems, reveal that the memory module effectively increases the exploration, enabling the model to discover diverse, higher-quality solutions. MARCO achieves good performance in a low computational cost, establishing a promising new direction in the field of NCO.
