A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems

Minseop Jung; Jaeseung Lee; Jibum Kim

A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems

Minseop Jung, Jaeseung Lee, Jibum Kim

TL;DR

This work is the first CNN-Transformer model based on a CNN embedding layer and partial self-attention for TSP, and exhibits the best performance in real-world datasets and outperforms other existing state-of-the-art Transformer-based models in various aspects.

Abstract

Several studies have attempted to solve traveling salesman problems (TSPs) using various deep learning techniques. Among them, Transformer-based models show state-of-the-art performance even for large-scale Traveling Salesman Problems (TSPs). However, they are based on fully-connected attention models and suffer from large computational complexity and GPU memory usage. Our work is the first CNN-Transformer model based on a CNN embedding layer and partial self-attention for TSP. Our CNN-Transformer model is able to better learn spatial features from input data using a CNN embedding layer compared with the standard Transformer-based models. It also removes considerable redundancy in fully-connected attention models using the proposed partial self-attention. Experimental results show that the proposed CNN embedding layer and partial self-attention are very effective in improving performance and computational complexity. The proposed model exhibits the best performance in real-world datasets and outperforms other existing state-of-the-art (SOTA) Transformer-based models in various aspects. Our code is publicly available at https://github.com/cm8908/CNN_Transformer3.

A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems

TL;DR

Abstract

Paper Structure (19 sections, 4 equations, 4 figures, 6 tables)

This paper contains 19 sections, 4 equations, 4 figures, 6 tables.

Introduction
Proposed Model
Encoder
CNN embedding layer
Encoder layer
Decoder
Model training based on reinforcement learning
Experiment
Datasets
Hyperparameters and decoding strategy
List of Experiments
Metrics
Results
Experiment 1
Experiment 2
...and 4 more sections

Figures (4)

Figure 1: Overview of the proposed CNN-Transformer model for TSP
Figure 2: Proposed encoder architecture with the CNN embedding layer and $L$ identical encoder layers
Figure 3: Proposed decoder architecture. This figure shows a decoding process when the current time step $t=10$ and the number of reference vectors $m=3$
Figure 4: Output tours of Concorde concorde, TSP Transformer bresson, Tspformer memory-eff, H-TSP h-tsp, and our model for (a) kroC100 and (b) berlin52 using the TSPLIB dataset

A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems

TL;DR

Abstract

A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems

Authors

TL;DR

Abstract

Table of Contents

Figures (4)