Diversity Optimization for Travelling Salesman Problem via Deep Reinforcement Learning

Qi Li; Zhiguang Cao; Yining Ma; Yaoxin Wu; Yue-Jiao Gong

Diversity Optimization for Travelling Salesman Problem via Deep Reinforcement Learning

Qi Li, Zhiguang Cao, Yining Ma, Yaoxin Wu, Yue-Jiao Gong

TL;DR

The paper tackles Multi-Solution TSP (MSTSP) by proposing RF-MA3S, a neural solver with an encoder that uses a Relativization Filter to achieve affine-invariance and a set of five parallel decoders (MA3S) for diverse high-quality solutions. Training employs reinforcement learning with a best-decoder baseline and temperature-softmax, followed by an Adaptive Active Search phase that adaptively switches baselines to balance optimality and diversity, quantified by the Multi-Solution Quality Index $MSQI$ built from $Opt$ and $Diff$. Empirical results on MSTSPLIB, TSPLIB, and CVRPLIB show RF-MA3S outperforms neural baselines in MSQI and/or DI and remains competitive with traditional heuristics while offering significantly faster inference; the method also extends to CVRP and demonstrates affine transformation resistance. Limitations include imperfect mirror handling and opportunities to improve search strategies (e.g., SGBS, RRC); future work targets broader invariances and further efficiency gains with advanced training and inference techniques.

Abstract

Existing neural methods for the Travelling Salesman Problem (TSP) mostly aim at finding a single optimal solution. To discover diverse yet high-quality solutions for Multi-Solution TSP (MSTSP), we propose a novel deep reinforcement learning based neural solver, which is primarily featured by an encoder-decoder structured policy. Concretely, on the one hand, a Relativization Filter (RF) is designed to enhance the robustness of the encoder to affine transformations of the instances, so as to potentially improve the quality of the found solutions. On the other hand, a Multi-Attentive Adaptive Active Search (MA3S) is tailored to allow the decoders to strike a balance between the optimality and diversity. Experimental evaluations on benchmark instances demonstrate the superiority of our method over recent neural baselines across different metrics, and its competitive performance against state-of-the-art traditional heuristics with significantly reduced computational time, ranging from $1.3\times$ to $15\times$ faster. Furthermore, we demonstrate that our method can also be applied to the Capacitated Vehicle Routing Problem (CVRP).

Diversity Optimization for Travelling Salesman Problem via Deep Reinforcement Learning

TL;DR

built from

and

. Empirical results on MSTSPLIB, TSPLIB, and CVRPLIB show RF-MA3S outperforms neural baselines in MSQI and/or DI and remains competitive with traditional heuristics while offering significantly faster inference; the method also extends to CVRP and demonstrates affine transformation resistance. Limitations include imperfect mirror handling and opportunities to improve search strategies (e.g., SGBS, RRC); future work targets broader invariances and further efficiency gains with advanced training and inference techniques.

Abstract

faster. Furthermore, we demonstrate that our method can also be applied to the Capacitated Vehicle Routing Problem (CVRP).

Paper Structure (22 sections, 16 equations, 8 figures, 12 tables, 1 algorithm)

This paper contains 22 sections, 16 equations, 8 figures, 12 tables, 1 algorithm.

Introduction
Related work
MSTSP notations and measures
MSTSP Definition
Performance Measures
Methodology
Relativization Filter (RF) Assisted Encoder
Multi-attentive Decoders
Training Phase
Adaptive Active Search Phase
Experiments
Experiment Settings
Hyperparameter
Performance Comparisons
In-depth Analysis of RF-MA3S
...and 7 more sections

Figures (8)

Figure 1: Illustration of a TSP-10 instance with multiple optima of equal length (retrieved from huang2018seeking).
Figure 2: The architecture of our neural heuristic, RF-MA3S. It is mainly featured by a relativization filter (RF) assisted encoder, multi-attentive decoders, and the adaptive active search during inference.
Figure 3: The multiple affine transformations of an instance become consistent after RF processing.
Figure 4: Investigating adaptive design in MA3S.
Figure 5: Ablation studies. Note: each experiment is incrementally adding components based on the previous one.
...and 3 more figures

Diversity Optimization for Travelling Salesman Problem via Deep Reinforcement Learning

TL;DR

Abstract

Diversity Optimization for Travelling Salesman Problem via Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)