Table of Contents
Fetching ...

Test-Time Augmentation for Traveling Salesperson Problem

Ryo Ishiyama, Takahiro Shirakawa, Seiichi Uchida, Shinnosuke Matsuo

TL;DR

This paper addresses solving the Traveling Salesperson Problem (TSP) by introducing Test-Time Augmentation (TTA) to a Transformer-based solver. It represents instances as an $N\times N$ distance matrix and generates $M$ variant inputs via random city permutations, solving each variant with a sequence-to-sequence model and selecting the best tour. The approach yields shorter tours than recent ML solvers on TSP50 and TSP100, with the optimality gap decreasing monotonically as $M$ grows, and shows strong instance-level improvements. Ablation studies confirm the necessity of the distance-matrix representation and TTA, while also noting limitations such as fixed city count $N$ and pointing to future work on trainable/TTA strategies and variable-$N$ handling.

Abstract

We propose Test-Time Augmentation (TTA) as an effective technique for addressing combinatorial optimization problems, including the Traveling Salesperson Problem. In general, deep learning models possessing the property of invariance, where the output is uniquely determined regardless of the node indices, have been proposed to learn graph structures efficiently. In contrast, we interpret the permutation of node indices, which exchanges the elements of the distance matrix, as a TTA scheme. The results demonstrate that our method is capable of obtaining shorter solutions than the latest models. Furthermore, we show that the probability of finding a solution closer to an exact solution increases depending on the augmentation size.

Test-Time Augmentation for Traveling Salesperson Problem

TL;DR

This paper addresses solving the Traveling Salesperson Problem (TSP) by introducing Test-Time Augmentation (TTA) to a Transformer-based solver. It represents instances as an distance matrix and generates variant inputs via random city permutations, solving each variant with a sequence-to-sequence model and selecting the best tour. The approach yields shorter tours than recent ML solvers on TSP50 and TSP100, with the optimality gap decreasing monotonically as grows, and shows strong instance-level improvements. Ablation studies confirm the necessity of the distance-matrix representation and TTA, while also noting limitations such as fixed city count and pointing to future work on trainable/TTA strategies and variable- handling.

Abstract

We propose Test-Time Augmentation (TTA) as an effective technique for addressing combinatorial optimization problems, including the Traveling Salesperson Problem. In general, deep learning models possessing the property of invariance, where the output is uniquely determined regardless of the node indices, have been proposed to learn graph structures efficiently. In contrast, we interpret the permutation of node indices, which exchanges the elements of the distance matrix, as a TTA scheme. The results demonstrate that our method is capable of obtaining shorter solutions than the latest models. Furthermore, we show that the probability of finding a solution closer to an exact solution increases depending on the augmentation size.
Paper Structure (20 sections, 2 equations, 5 figures, 1 table)

This paper contains 20 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: An instance and its solution of the (Euclidean) traveling salesperson problem (TSP). Note that its distance matrix depends on the order of cities.
  • Figure 2: Transformer encoder and decoder model. (a) The standard sequence-to-sequence model with positional encoding. (b) A set-to-sequence model for solving TSP kool2018attentionbresson2021transformer, where positional encoding is removed from (a). (c) TTA with the $M$ sequence-to-sequence model of (a). (d) The proposed model, where $M$ distance matrices are used for TTA.
  • Figure 3: Solution examples in TSP50 and TSP100. The number indicates the tour length. Our model assumes $M=3$.
  • Figure 4: Left: Effect of the augmentation size $M$ on the performance of our model on TSP50. The performance of the latest models bresson2021transformerjung2023lightweight under different beam width $B$ is also plotted. Right: Performance variations in all instances in TSP50 or TSP100. Shaded regions indicate the standard deviation intervals. Note that the vertical axis of the left plot is logarithmic, whereas that of the right plot is not.
  • Figure 5: Instance-level comparison with Bresson et al. bresson2021transformer. Here, the tour length is used as the performance metric. In the upper plots, the vertical axes correspond to the performance of our model without TTA, whereas the horizontal axes correspond to Bresson et al. In the lower plots, the vertical axes show the performance of our model (with TTA). By comparing the upper and lower plots, we can also observe the effect of TTA on our model.