Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem

Bowen Fang; Xu Chen; Xuan Di

Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem

Bowen Fang, Xu Chen, Xuan Di

TL;DR

The paper tackles PDTSP, where each pickup node $i$ must precede its corresponding delivery node $n+i$ via the constraint $p_i<d_{n+i}$, and traditional solvers struggle to scale. It introduces L2T, a reinforcement-learning framework that uses a unified operator set (N1,N2,N3,B1,B2) designed to map feasible tours to other feasible tours, thereby confining search to the feasible solution space. A key idea is to represent tours as sequences of pickup and delivery blocks and to construct initial feasible tours via a simple feasibility-based rule; the policy network learns to select operators to iteratively improve tour cost, with a PPO optimization and a feature-rich architecture. Empirical results on Grubhub PDTSP instances and a Capacitated-PDTSP scenario show that L2T achieves shorter tours and superior scalability compared to strong baselines such as OR-tools, Gurobi, Ptr-Net, Transformer, and LKH3, highlighting its practical impact for large-scale pickup-delivery routing tasks.

Abstract

This paper aims to develop a learning method for a special class of traveling salesman problems (TSP), namely, the pickup-and-delivery TSP (PDTSP), which finds the shortest tour along a sequence of one-to-one pickup-and-delivery nodes. One-to-one here means that the transported people or goods are associated with designated pairs of pickup and delivery nodes, in contrast to that indistinguishable goods can be delivered to any nodes. In PDTSP, precedence constraints need to be satisfied that each pickup node must be visited before its corresponding delivery node. Classic operations research (OR) algorithms for PDTSP are difficult to scale to large-sized problems. Recently, reinforcement learning (RL) has been applied to TSPs. The basic idea is to explore and evaluate visiting sequences in a solution space. However, this approach could be less computationally efficient, as it has to potentially evaluate many infeasible solutions of which precedence constraints are violated. To restrict solution search within a feasible space, we utilize operators that always map one feasible solution to another, without spending time exploring the infeasible solution space. Such operators are evaluated and selected as policies to solve PDTSPs in an RL framework. We make a comparison of our method and baselines, including classic OR algorithms and existing learning methods. Results show that our approach can find tours shorter than baselines.

Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem

TL;DR

The paper tackles PDTSP, where each pickup node

must precede its corresponding delivery node

via the constraint

, and traditional solvers struggle to scale. It introduces L2T, a reinforcement-learning framework that uses a unified operator set (N1,N2,N3,B1,B2) designed to map feasible tours to other feasible tours, thereby confining search to the feasible solution space. A key idea is to represent tours as sequences of pickup and delivery blocks and to construct initial feasible tours via a simple feasibility-based rule; the policy network learns to select operators to iteratively improve tour cost, with a PPO optimization and a feature-rich architecture. Empirical results on Grubhub PDTSP instances and a Capacitated-PDTSP scenario show that L2T achieves shorter tours and superior scalability compared to strong baselines such as OR-tools, Gurobi, Ptr-Net, Transformer, and LKH3, highlighting its practical impact for large-scale pickup-delivery routing tasks.

Abstract

Paper Structure (16 sections, 11 theorems, 8 equations, 16 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 11 theorems, 8 equations, 16 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Problem Statement
A primer on pickup and delivery traveling salesman problem (PDTSP)
Why is feasible solution mapping important?
Blocks in a tour
Methodology
RL framework
Initial tour construction
Learning operator design
Solution Approach
Experiment
Conclusion
Appendices
IP Formulation
...and 1 more sections

Key Result

Proposition 3.1

The total number of Hamiltonian cycle in PDTSP is $(2n)!$. The total number of feasible Hamiltonian cycles (i.e., tours) is $\frac{(2n)!}{2^n}$.

Figures (16)

Figure 1: Motivating examples
Figure 2: PDTSP vs TSP. In the left bar chat, we list the computational time for PDTSP and TSP. The x-axis denotes the number of node $|\mathcal{N}|$ and the y-axis is the log of the compulational time (s). As the size of instance grows, the solving time for PDTSP is thousand times longer than TSP. We list the number of variables, constraints and feasible tours given the same number of nodes $|\mathcal{N}|$ to be visited.
Figure 3: Tour in a toy example ($n=5$)
Figure 4: RL framework
Figure 5: Naive operator
...and 11 more figures

Theorems & Definitions (16)

Definition 3.1
Definition 3.2
Definition 3.3
Proposition 3.1
Definition 3.4
Proposition 3.2
Proposition 3.3
Proposition 3.4
Proposition 4.1
Proposition 4.2
...and 6 more

Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem

TL;DR

Abstract

Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (16)