Table of Contents
Fetching ...

Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

Yubin Xiao, Di Wang, Boyang Li, Huanhuan Chen, Wei Pang, Xuan Wu, Hao Li, Dong Xu, Yanchun Liang, You Zhou

TL;DR

NAR4TSP is the first TSP solver that successfully combines RL and NAR networks and demonstrates that NAR4TSP outperforms five state-of-the-art models in terms of solution quality, inference speed, and generalization to unseen scenarios.

Abstract

The Traveling Salesman Problem (TSP) is a well-known combinatorial optimization problem with broad real-world applications. Recently, neural networks have gained popularity in this research area because as shown in the literature, they provide strong heuristic solutions to TSPs. Compared to autoregressive neural approaches, non-autoregressive (NAR) networks exploit the inference parallelism to elevate inference speed but suffer from comparatively low solution quality. In this paper, we propose a novel NAR model named NAR4TSP, which incorporates a specially designed architecture and an enhanced reinforcement learning strategy. To the best of our knowledge, NAR4TSP is the first TSP solver that successfully combines RL and NAR networks. The key lies in the incorporation of NAR network output decoding into the training process. NAR4TSP efficiently represents TSP encoded information as rewards and seamlessly integrates it into reinforcement learning strategies, while maintaining consistent TSP sequence constraints during both training and testing phases. Experimental results on both synthetic and real-world TSPs demonstrate that NAR4TSP outperforms five state-of-the-art models in terms of solution quality, inference speed, and generalization to unseen scenarios.

Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

TL;DR

NAR4TSP is the first TSP solver that successfully combines RL and NAR networks and demonstrates that NAR4TSP outperforms five state-of-the-art models in terms of solution quality, inference speed, and generalization to unseen scenarios.

Abstract

The Traveling Salesman Problem (TSP) is a well-known combinatorial optimization problem with broad real-world applications. Recently, neural networks have gained popularity in this research area because as shown in the literature, they provide strong heuristic solutions to TSPs. Compared to autoregressive neural approaches, non-autoregressive (NAR) networks exploit the inference parallelism to elevate inference speed but suffer from comparatively low solution quality. In this paper, we propose a novel NAR model named NAR4TSP, which incorporates a specially designed architecture and an enhanced reinforcement learning strategy. To the best of our knowledge, NAR4TSP is the first TSP solver that successfully combines RL and NAR networks. The key lies in the incorporation of NAR network output decoding into the training process. NAR4TSP efficiently represents TSP encoded information as rewards and seamlessly integrates it into reinforcement learning strategies, while maintaining consistent TSP sequence constraints during both training and testing phases. Experimental results on both synthetic and real-world TSPs demonstrate that NAR4TSP outperforms five state-of-the-art models in terms of solution quality, inference speed, and generalization to unseen scenarios.
Paper Structure (28 sections, 37 equations, 5 figures, 9 tables, 1 algorithm)

This paper contains 28 sections, 37 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: The pipeline of NAR4TSP. Taking a TSP instance $o$ with the node coordinate $x^v_i$ and the Euclidean distance between nodes $x^{e}_{i,j}$ as inputs, the model starts by processing the information through a linear layer, transforming it into node features $v_i$ and edge features $e_{i,j}$. The model then interacts with a randomly initialized learnable starting symbol $v_h$ via $N_g$ GNN modules, and finally outputs a starting-node pointer $\bm{\beta}$ and a matrix $A$ of edge scores. The output $\bm{\beta}$ and $A$ are subsequently decoded into a feasible TSP tour through sampling or greedy search. The solution is obtained in a one-shot, NAR manner, and its quality is treated as a reward and optimized by an enhanced RL strategy.
  • Figure 2: An illustration of the decoding process of NAR4TSP using greedy search, assuming there are four nodes in the graph.
  • Figure 3: Comparison on the inference time between our model and SOTA models with paired samples t-tests.
  • Figure 4: Visualization of a TSP5 instance's greedy decoding process.
  • Figure 5: Visualization of solutions produced by NAR4TSP using greedy search and Concorde, respectively.