Table of Contents
Fetching ...

A Unified Deep Reinforcement Learning Approach for Close Enough Traveling Salesman Problem

Mingfeng Fan, Jiaqi Cheng, Yaoxin Wu, Yifeng Zhang, Yibin Yang, Guohua Wu, Guillaume Sartoretti

TL;DR

This paper tackles the close-enough Traveling Salesman Problem (CETSP), where each target must be intersected within a neighborhood rather than visited exactly. It introduces UD3RL, a unified dual-decoder DRL framework that decouples node selection and waypoint determination, uses Perimetral Discretization Scheme to convert neighborhoods into finite candidate points, and leverages a $k$-NN subgraph to enhance spatial reasoning during waypoint decoding. A customized REINFORCE training regime enables generalization across varying problem sizes and neighborhood radii (constant or random), with strong performance in static and dynamic CETSP settings. Empirical results show UD3RL outperforms traditional heuristics and a hybrid DRL–SOCP approach in solution quality and runtime, while displaying robust generalization across problem scales, spatial distributions, and dynamic environments, highlighting its practical potential for real-time routing tasks.

Abstract

In recent years, deep reinforcement learning (DRL) has gained traction for solving the NP-hard traveling salesman problem (TSP). However, limited attention has been given to the close-enough TSP (CETSP), primarily due to the challenge introduced by its neighborhood-based visitation criterion, wherein a node is considered visited if the agent enters a compact neighborhood around it. In this work, we formulate a Markov decision process (MDP) for CETSP using a discretization scheme and propose a novel unified dual-decoder DRL (UD3RL) framework that separates decision-making into node selection and waypoint determination. Specifically, an adapted encoder is employed for effective feature extraction, followed by a node-decoder and a loc-decoder to handle the two sub-tasks, respectively. A k-nearest neighbors subgraph interaction strategy is further introduced to enhance spatial reasoning during location decoding. Furthermore, we customize the REINFORCE algorithm to train UD3RL as a unified model capable of generalizing across different problem sizes and varying neighborhood radius types (i.e., constant and random radii). Experimental results show that UD3RL outperforms conventional methods in both solution quality and runtime, while exhibiting strong generalization across problem scales, spatial distributions, and radius ranges, as well as robustness to dynamic environments.

A Unified Deep Reinforcement Learning Approach for Close Enough Traveling Salesman Problem

TL;DR

This paper tackles the close-enough Traveling Salesman Problem (CETSP), where each target must be intersected within a neighborhood rather than visited exactly. It introduces UD3RL, a unified dual-decoder DRL framework that decouples node selection and waypoint determination, uses Perimetral Discretization Scheme to convert neighborhoods into finite candidate points, and leverages a -NN subgraph to enhance spatial reasoning during waypoint decoding. A customized REINFORCE training regime enables generalization across varying problem sizes and neighborhood radii (constant or random), with strong performance in static and dynamic CETSP settings. Empirical results show UD3RL outperforms traditional heuristics and a hybrid DRL–SOCP approach in solution quality and runtime, while displaying robust generalization across problem scales, spatial distributions, and dynamic environments, highlighting its practical potential for real-time routing tasks.

Abstract

In recent years, deep reinforcement learning (DRL) has gained traction for solving the NP-hard traveling salesman problem (TSP). However, limited attention has been given to the close-enough TSP (CETSP), primarily due to the challenge introduced by its neighborhood-based visitation criterion, wherein a node is considered visited if the agent enters a compact neighborhood around it. In this work, we formulate a Markov decision process (MDP) for CETSP using a discretization scheme and propose a novel unified dual-decoder DRL (UD3RL) framework that separates decision-making into node selection and waypoint determination. Specifically, an adapted encoder is employed for effective feature extraction, followed by a node-decoder and a loc-decoder to handle the two sub-tasks, respectively. A k-nearest neighbors subgraph interaction strategy is further introduced to enhance spatial reasoning during location decoding. Furthermore, we customize the REINFORCE algorithm to train UD3RL as a unified model capable of generalizing across different problem sizes and varying neighborhood radius types (i.e., constant and random radii). Experimental results show that UD3RL outperforms conventional methods in both solution quality and runtime, while exhibiting strong generalization across problem scales, spatial distributions, and radius ranges, as well as robustness to dynamic environments.

Paper Structure

This paper contains 21 sections, 22 equations, 6 figures, 5 tables, 2 algorithms.

Figures (6)

  • Figure 1: An example path in the CETSP, where an agent is required to visit a set of target neighborhoods, each represented by a circular region.
  • Figure 2: PDS for $\gamma = 3$ with $\hat{N}(i) = \{c1, c2, c3\}$ and $\alpha = 120\degree$.
  • Figure 3: Architecture of the policy network used in our method. It consists of an adapted Transformer encoder, a node-decoder, and a loc-decoder. Given a batch of instances, the encoder generates node embeddings and a graph embedding. Based on these, the node-decoder outputs selection probabilities for the next node to visit, while the loc-decoder computes probabilities over candidate waypoints within the selected neighborhood. The loc-decoder also incorporates $k$-NN subgraph information to enhance spatial decision-making.
  • Figure 4: Visualization of CETSP solutions obtained by UD3RL-Aug across different problem scales. The top row shows results under the constant radius configuration, while the bottom row corresponds to the random radius configuration.
  • Figure 5: Visualization of instances with various data distributions and radius ranges.
  • ...and 1 more figures