Table of Contents
Fetching ...

iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning

Yifan Guo, Zhongqiang Ren, Chen Wang

TL;DR

This paper reformulates Min-Max Multiple Traveling Salesman Problem as a bilevel optimization problem, using the concept of imperative learning (IL), and introduces a control variate-based gradient estimation algorithm to tackle the high-variance gradient issues during the optimization.

Abstract

This paper considers a Min-Max Multiple Traveling Salesman Problem (MTSP), where the goal is to find a set of tours, one for each agent, to collectively visit all the cities while minimizing the length of the longest tour. Though MTSP has been widely studied, obtaining near-optimal solutions for large-scale problems is still challenging due to its NP-hardness. Recent efforts in data-driven methods face challenges of the need for hard-to-obtain supervision and issues with high variance in gradient estimations, leading to slow convergence and highly suboptimal solutions. We address these issues by reformulating MTSP as a bilevel optimization problem, using the concept of imperative learning (IL). This involves introducing an allocation network that decomposes the MTSP into multiple single-agent traveling salesman problems (TSPs). The longest tour from these TSP solutions is then used to self-supervise the allocation network, resulting in a new self-supervised, bilevel, end-to-end learning framework, which we refer to as imperative MTSP (iMTSP). Additionally, to tackle the high-variance gradient issues during the optimization, we introduce a control variate-based gradient estimation algorithm. Our experiments showed that these innovative designs enable our gradient estimator to converge 20% faster than the advanced reinforcement learning baseline and find up to 80% shorter tour length compared with Google OR-Tools MTSP solver, especially in large-scale problems (e.g. 1000 cities and 15 agents).

iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning

TL;DR

This paper reformulates Min-Max Multiple Traveling Salesman Problem as a bilevel optimization problem, using the concept of imperative learning (IL), and introduces a control variate-based gradient estimation algorithm to tackle the high-variance gradient issues during the optimization.

Abstract

This paper considers a Min-Max Multiple Traveling Salesman Problem (MTSP), where the goal is to find a set of tours, one for each agent, to collectively visit all the cities while minimizing the length of the longest tour. Though MTSP has been widely studied, obtaining near-optimal solutions for large-scale problems is still challenging due to its NP-hardness. Recent efforts in data-driven methods face challenges of the need for hard-to-obtain supervision and issues with high variance in gradient estimations, leading to slow convergence and highly suboptimal solutions. We address these issues by reformulating MTSP as a bilevel optimization problem, using the concept of imperative learning (IL). This involves introducing an allocation network that decomposes the MTSP into multiple single-agent traveling salesman problems (TSPs). The longest tour from these TSP solutions is then used to self-supervise the allocation network, resulting in a new self-supervised, bilevel, end-to-end learning framework, which we refer to as imperative MTSP (iMTSP). Additionally, to tackle the high-variance gradient issues during the optimization, we introduce a control variate-based gradient estimation algorithm. Our experiments showed that these innovative designs enable our gradient estimator to converge 20% faster than the advanced reinforcement learning baseline and find up to 80% shorter tour length compared with Google OR-Tools MTSP solver, especially in large-scale problems (e.g. 1000 cities and 15 agents).
Paper Structure (24 sections, 1 theorem, 16 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 1 theorem, 16 equations, 3 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

Let $h$ denotes an RV with an unknown expected value $\mathbb{E}(h)$, and $c$ denotes an RV with a known expected value $\mathbb{E}(c)$. The new RV $h_{new}=h+\zeta(c-\mathbb{E}(c))$ has same expected value but smaller variance, when the constant $\zeta$ is properly chosen and $c$ is correlated with

Figures (3)

  • Figure 1: The framework of our self-supervised MTSP network. The allocation network uses supervision from the TSP solver, and the surrogate network is supervised by the single sample variance estimator, reducing the gradient variance.
  • Figure 2: The figures visualize the performance of the baselines and the proposed model on two example MTSPs with $5$ agents and $500$ cities, where the first instance has a central depot and the second instance has an off-center depot. The numbers denote the length of the route, and different colors represent different agents. Longest tour lengths are underlined in each figure. iMTSP always produces the best solution and has fewer sub-optimal patterns like circular partial routes or long straight partial routes.
  • Figure 3: The gradient variance history of our method and the RL baseline during the training process with 50 and 100 cities. The $y$-axis is the sum of the natural logarithm of the variance of all trainable parameters in the allocation network, and $x$-axis denotes the number of iterations. Our gradient estimator converges about $20\times$ faster than the RL baseline.

Theorems & Definitions (2)

  • Lemma 1
  • proof