Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods
Min Kyu Shin, Su-Jeong Park, Seung-Keol Ryu, Heeyeon Kim, Han-Lim Choi
TL;DR
This work tackles DTSPN for non-holonomic vehicles by introducing DiPDTSP, a two-phase learning framework that distills privileged expert information into a PI-free adaptation network. Phase 1 performs RL fine-tuning with privileged information to train a high-quality policy, while Phase 2 trains an adaptation network to replicate the encoder’s latent representation without any privileged data. The approach achieves substantial speedups (roughly 50×) over LKH-based heuristics and outperforms standard imitation-learning baselines, while reliably sensing all tasks in simulations. By leveraging privileged information during training but operating without it at deployment, DiPDTSP provides fast, robust, sensor-aware DTSPN planning suitable for real-time autonomous navigation.
Abstract
This paper presents a novel learning approach for Dubins Traveling Salesman Problems(DTSP) with Neighborhood (DTSPN) to quickly produce a tour of a non-holonomic vehicle passing through neighborhoods of given task points. The method involves two learning phases: initially, a model-free reinforcement learning approach leverages privileged information to distill knowledge from expert trajectories generated by the LinKernighan heuristic (LKH) algorithm. Subsequently, a supervised learning phase trains an adaptation network to solve problems independently of privileged information. Before the first learning phase, a parameter initialization technique using the demonstration data was also devised to enhance training efficiency. The proposed learning method produces a solution about 50 times faster than LKH and substantially outperforms other imitation learning and RL with demonstration schemes, most of which fail to sense all the task points.
