Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods

Min Kyu Shin; Su-Jeong Park; Seung-Keol Ryu; Heeyeon Kim; Han-Lim Choi

Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods

Min Kyu Shin, Su-Jeong Park, Seung-Keol Ryu, Heeyeon Kim, Han-Lim Choi

TL;DR

This work tackles DTSPN for non-holonomic vehicles by introducing DiPDTSP, a two-phase learning framework that distills privileged expert information into a PI-free adaptation network. Phase 1 performs RL fine-tuning with privileged information to train a high-quality policy, while Phase 2 trains an adaptation network to replicate the encoder’s latent representation without any privileged data. The approach achieves substantial speedups (roughly 50×) over LKH-based heuristics and outperforms standard imitation-learning baselines, while reliably sensing all tasks in simulations. By leveraging privileged information during training but operating without it at deployment, DiPDTSP provides fast, robust, sensor-aware DTSPN planning suitable for real-time autonomous navigation.

Abstract

This paper presents a novel learning approach for Dubins Traveling Salesman Problems(DTSP) with Neighborhood (DTSPN) to quickly produce a tour of a non-holonomic vehicle passing through neighborhoods of given task points. The method involves two learning phases: initially, a model-free reinforcement learning approach leverages privileged information to distill knowledge from expert trajectories generated by the LinKernighan heuristic (LKH) algorithm. Subsequently, a supervised learning phase trains an adaptation network to solve problems independently of privileged information. Before the first learning phase, a parameter initialization technique using the demonstration data was also devised to enhance training efficiency. The proposed learning method produces a solution about 50 times faster than LKH and substantially outperforms other imitation learning and RL with demonstration schemes, most of which fail to sense all the task points.

Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods

TL;DR

Abstract

Paper Structure (25 sections, 5 equations, 4 figures, 1 table)

This paper contains 25 sections, 5 equations, 4 figures, 1 table.

Related Works
Dubins TSP with Neighborhoods
Combining demonstrations with Reinforcement Learning
Learning using Privileged Information
Approach
Dubins Kinematics with Deep RL
Pretraining with Behavioral Cloning
Phase 1: RL Fine-tuning with Privileged Information
Phase 2: PI-free Policy Adaptation
Experiments
Experiment Setup
Baselines
BC ross2010efficient
GAIL ho2016generative
PPO schulman2017proximal with dense rewards
...and 10 more sections

Figures (4)

Figure 1: The proposed DiPDTSP controls an agent with a sensor to solve DTSPN20
Figure 2: DiPDTSP has two training phases. In the first training phase (up), the encoder gets common state $s$ and privileged information $p_e$, which are 4 relative positions and heading angles from expert trajectories. The encoder and policy network trains with model-free RL. In the second training phase (down), the adaptation network distills the encoder network and trains to generate the same latent variable with the encoder by supervised learning. The final adaptation network and policy network generate DTSPN trajectories only with the given position of agent and tasks
Figure 3: Average reward over 3M training steps of our method and baselines. DiPDTSP(olive) has a few reward differences from an expert. Due to the early convergence of algorithms, we use the log of time steps in the x-axis.
Figure 4: The demonstrations of DiPDTSP and baselines methods. The top and bottom figures show two demonstrations with different initial positions of tasks and agents. Expert trajectories are red dashed lines, and derived agent trajectories are green lines. When it senses the tasks, the light green radius represents the sensor coverage. The baselines get far away from the expert path and show low coverage rates.

Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods

TL;DR

Abstract

Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods

Authors

TL;DR

Abstract

Table of Contents

Figures (4)