Table of Contents
Fetching ...

Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed

Yubin Xiao, Di Wang, Boyang Li, Mingzhao Wang, Xuan Wu, Changliang Zhou, You Zhou

TL;DR

The paper addresses the real-time VRP need by bridging Autoregressive (AR) and Non-Autoregressive (NAR) approaches. It introduces Guided NAR Knowledge Distillation (GNARKD), a generic KD framework that converts a Transformer-based AR model into a parallelizable NAR student by modifying only the decoder while preserving the encoder, and trains the student with guidance from the AR teacher’s actions. The key contributions are (i) a detailed architectural redesign of the NAR student, (ii) a guided decoding scheme that preserves order information via a proxy distribution learned through KL divergence, and (iii) extensive experiments showing GNARKD achieves $4$–$5\times$ faster inference with only a $2$–$3\%$ drop in solution quality, with competitive results on CVRPs and strong gains in real-world-like scenarios. This work demonstrates a practical path to deploy near-optimal VRP solvers in time-constrained environments, and it opens avenues for multi-teacher distillation and constraint-aware NAR decoding in future research.

Abstract

Neural construction models have shown promising performance for Vehicle Routing Problems (VRPs) by adopting either the Autoregressive (AR) or Non-Autoregressive (NAR) learning approach. While AR models produce high-quality solutions, they generally have a high inference latency due to their sequential generation nature. Conversely, NAR models generate solutions in parallel with a low inference latency but generally exhibit inferior performance. In this paper, we propose a generic Guided Non-Autoregressive Knowledge Distillation (GNARKD) method to obtain high-performance NAR models having a low inference latency. GNARKD removes the constraint of sequential generation in AR models while preserving the learned pivotal components in the network architecture to obtain the corresponding NAR models through knowledge distillation. We evaluate GNARKD by applying it to three widely adopted AR models to obtain NAR VRP solvers for both synthesized and real-world instances. The experimental results demonstrate that GNARKD significantly reduces the inference time (4-5 times faster) with acceptable performance drop (2-3\%). To the best of our knowledge, this study is first-of-its-kind to obtain NAR VRP solvers from AR ones through knowledge distillation.

Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed

TL;DR

The paper addresses the real-time VRP need by bridging Autoregressive (AR) and Non-Autoregressive (NAR) approaches. It introduces Guided NAR Knowledge Distillation (GNARKD), a generic KD framework that converts a Transformer-based AR model into a parallelizable NAR student by modifying only the decoder while preserving the encoder, and trains the student with guidance from the AR teacher’s actions. The key contributions are (i) a detailed architectural redesign of the NAR student, (ii) a guided decoding scheme that preserves order information via a proxy distribution learned through KL divergence, and (iii) extensive experiments showing GNARKD achieves faster inference with only a drop in solution quality, with competitive results on CVRPs and strong gains in real-world-like scenarios. This work demonstrates a practical path to deploy near-optimal VRP solvers in time-constrained environments, and it opens avenues for multi-teacher distillation and constraint-aware NAR decoding in future research.

Abstract

Neural construction models have shown promising performance for Vehicle Routing Problems (VRPs) by adopting either the Autoregressive (AR) or Non-Autoregressive (NAR) learning approach. While AR models produce high-quality solutions, they generally have a high inference latency due to their sequential generation nature. Conversely, NAR models generate solutions in parallel with a low inference latency but generally exhibit inferior performance. In this paper, we propose a generic Guided Non-Autoregressive Knowledge Distillation (GNARKD) method to obtain high-performance NAR models having a low inference latency. GNARKD removes the constraint of sequential generation in AR models while preserving the learned pivotal components in the network architecture to obtain the corresponding NAR models through knowledge distillation. We evaluate GNARKD by applying it to three widely adopted AR models to obtain NAR VRP solvers for both synthesized and real-world instances. The experimental results demonstrate that GNARKD significantly reduces the inference time (4-5 times faster) with acceptable performance drop (2-3\%). To the best of our knowledge, this study is first-of-its-kind to obtain NAR VRP solvers from AR ones through knowledge distillation.
Paper Structure (32 sections, 11 equations, 5 figures, 4 tables)

This paper contains 32 sections, 11 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Action probability distribution of TM, GCN and GNARKD-TM when solving a randomly generated TSP-50 instance.
  • Figure 2: The architecture of GNARKD, which transforms the teacher AR model into a student NAR model.
  • Figure 3: Comparison on the execution time between GNARKD students and their respective teachers. Because POMO uses the multiple greedy rollouts instead of beam search, we only report the time taken by GNARKD-POMO using the same inference method. Moreover, constrained by GPU memory, POMO is unable to solve TSP instances larger than 850 and CVRP instances larger than 1,000.
  • Figure 4: Performance of GNARKD students under different distillation temperatures.
  • Figure 5: Action probability distribution of GNARKD-TM trained with different temperature $T_1\in \{0.1, 0.5, 1, 5, 10\}$ when solving a randomly generated TSP-50 instance.