Table of Contents
Fetching ...

MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial Optimization

Andoni I. Garmendia, Quentin Cappart, Josu Ceberio, Alexander Mendiburu

TL;DR

MARCO addresses inefficient exploration in Neural Combinatorial Optimization by introducing a memory-augmented reinforcement framework that stores and retrieves past solutions to guide current decisions. It supports both neural improvement and neural constructive methods, leveraging a shared memory across parallel threads to enable collaborative exploration. Empirical results on MC, MIS, and TSP demonstrate that MARCO improves solution quality and exploration efficiency, often surpassing learning-based baselines while maintaining low computational cost. The work highlights memory as a powerful mechanism for enhancing NCO, with clear directions for memory management and retrieval enhancements in future research.

Abstract

Neural Combinatorial Optimization (NCO) is an emerging domain where deep learning techniques are employed to address combinatorial optimization problems as a standalone solver. Despite their potential, existing NCO methods often suffer from inefficient search space exploration, frequently leading to local optima entrapment or redundant exploration of previously visited states. This paper introduces a versatile framework, referred to as Memory-Augmented Reinforcement for Combinatorial Optimization (MARCO), that can be used to enhance both constructive and improvement methods in NCO through an innovative memory module. MARCO stores data collected throughout the optimization trajectory and retrieves contextually relevant information at each state. This way, the search is guided by two competing criteria: making the best decision in terms of the quality of the solution and avoiding revisiting already explored solutions. This approach promotes a more efficient use of the available optimization budget. Moreover, thanks to the parallel nature of NCO models, several search threads can run simultaneously, all sharing the same memory module, enabling an efficient collaborative exploration. Empirical evaluations, carried out on the maximum cut, maximum independent set and travelling salesman problems, reveal that the memory module effectively increases the exploration, enabling the model to discover diverse, higher-quality solutions. MARCO achieves good performance in a low computational cost, establishing a promising new direction in the field of NCO.

MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial Optimization

TL;DR

MARCO addresses inefficient exploration in Neural Combinatorial Optimization by introducing a memory-augmented reinforcement framework that stores and retrieves past solutions to guide current decisions. It supports both neural improvement and neural constructive methods, leveraging a shared memory across parallel threads to enable collaborative exploration. Empirical results on MC, MIS, and TSP demonstrate that MARCO improves solution quality and exploration efficiency, often surpassing learning-based baselines while maintaining low computational cost. The work highlights memory as a powerful mechanism for enhancing NCO, with clear directions for memory management and retrieval enhancements in future research.

Abstract

Neural Combinatorial Optimization (NCO) is an emerging domain where deep learning techniques are employed to address combinatorial optimization problems as a standalone solver. Despite their potential, existing NCO methods often suffer from inefficient search space exploration, frequently leading to local optima entrapment or redundant exploration of previously visited states. This paper introduces a versatile framework, referred to as Memory-Augmented Reinforcement for Combinatorial Optimization (MARCO), that can be used to enhance both constructive and improvement methods in NCO through an innovative memory module. MARCO stores data collected throughout the optimization trajectory and retrieves contextually relevant information at each state. This way, the search is guided by two competing criteria: making the best decision in terms of the quality of the solution and avoiding revisiting already explored solutions. This approach promotes a more efficient use of the available optimization budget. Moreover, thanks to the parallel nature of NCO models, several search threads can run simultaneously, all sharing the same memory module, enabling an efficient collaborative exploration. Empirical evaluations, carried out on the maximum cut, maximum independent set and travelling salesman problems, reveal that the memory module effectively increases the exploration, enabling the model to discover diverse, higher-quality solutions. MARCO achieves good performance in a low computational cost, establishing a promising new direction in the field of NCO.
Paper Structure (38 sections, 4 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 38 sections, 4 equations, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: Schematic of the MARCO framework, illustrating its memory integration process. Each visited solution, $\sigma_t$, is stored in the memory module. During each iteration of the search process, MARCO performs a similarity-based retrieval to access relevant context from past visited solutions, retrieving the $k$ nearest solutions. Then, the retrieved solutions are aggregated using a weighted average, with the similarity being the weight. The retrieval process can be seen as an attention mechanism where the current state serves as the query (Q), and stored past solutions function as keys (K) and NC methods use the same solutions as values (V), while NI methods use corresponding actions.
  • Figure 2: MARCO for Neural Improvement methods for the Maximum Cut problem. Initially, multiple solutions are randomly generated for a given problem instance. Each solution is iteratively improved, forming a thread. Throughout this process, the visited solutions and corresponding actions are stored into a shared memory. This collective memory then updates the graph features fed to the model.
  • Figure 3: MARCO for Neural Constructive methods. Travelling Salesman example. Each solution in a batch begins with a distinct initial node. Subsequently, every thread proceeds to iteratively construct a solution, considering data gathered from the memory module. Upon the completion of this construction process, the obtained solution is stored within the memory, serving as a reference for subsequent solution constructions.
  • Figure 4: Ablation Study
  • Figure 5: Performance of MARCO in MC with different $k$ nearest neighbors retrieved from memory for graphs with 200 and 500 nodes.
  • ...and 2 more figures