Table of Contents
Fetching ...

MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver

Yuepeng Zheng, Fu Luo, Zhenkun Wang, Yaoxin Wu, Yu Zhou

TL;DR

This work tackles cross-variant and large-scale vehicle routing by marrying a heavy decoder with multi-task knowledge distillation (MTL-KD). It transfers policy knowledge from multiple RL-based single-task teachers into a single generalizable student, enabling label-free training across seen tasks and strong zero-shot generalization to unseen VRP variants. A novel inference strategy, Random Reordering Re-Construct (R3C), further enhances solution diversity and performance. Empirical results on 6 seen and 10 unseen tasks, up to 1000 nodes, and real-world datasets demonstrate robust scale generalization and superiority over existing multi-task VRP models and baselines.

Abstract

Multi-Task Learning (MTL) in Neural Combinatorial Optimization (NCO) is a promising approach to train a unified model capable of solving multiple Vehicle Routing Problem (VRP) variants. However, existing Reinforcement Learning (RL)-based multi-task methods can only train light decoder models on small-scale problems, exhibiting limited generalization ability when solving large-scale problems. To overcome this limitation, this work introduces a novel multi-task learning method driven by knowledge distillation (MTL-KD), which enables the efficient training of heavy decoder models with strong generalization ability. The proposed MTL-KD method transfers policy knowledge from multiple distinct RL-based single-task models to a single heavy decoder model, facilitating label-free training and effectively improving the model's generalization ability across diverse tasks. In addition, we introduce a flexible inference strategy termed Random Reordering Re-Construction (R3C), which is specifically adapted for diverse VRP tasks and further boosts the performance of the multi-task model. Experimental results on 6 seen and 10 unseen VRP variants with up to 1000 nodes indicate that our proposed method consistently achieves superior performance on both uniform and real-world benchmarks, demonstrating robust generalization abilities.

MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver

TL;DR

This work tackles cross-variant and large-scale vehicle routing by marrying a heavy decoder with multi-task knowledge distillation (MTL-KD). It transfers policy knowledge from multiple RL-based single-task teachers into a single generalizable student, enabling label-free training across seen tasks and strong zero-shot generalization to unseen VRP variants. A novel inference strategy, Random Reordering Re-Construct (R3C), further enhances solution diversity and performance. Empirical results on 6 seen and 10 unseen tasks, up to 1000 nodes, and real-world datasets demonstrate robust scale generalization and superiority over existing multi-task VRP models and baselines.

Abstract

Multi-Task Learning (MTL) in Neural Combinatorial Optimization (NCO) is a promising approach to train a unified model capable of solving multiple Vehicle Routing Problem (VRP) variants. However, existing Reinforcement Learning (RL)-based multi-task methods can only train light decoder models on small-scale problems, exhibiting limited generalization ability when solving large-scale problems. To overcome this limitation, this work introduces a novel multi-task learning method driven by knowledge distillation (MTL-KD), which enables the efficient training of heavy decoder models with strong generalization ability. The proposed MTL-KD method transfers policy knowledge from multiple distinct RL-based single-task models to a single heavy decoder model, facilitating label-free training and effectively improving the model's generalization ability across diverse tasks. In addition, we introduce a flexible inference strategy termed Random Reordering Re-Construction (R3C), which is specifically adapted for diverse VRP tasks and further boosts the performance of the multi-task model. Experimental results on 6 seen and 10 unseen VRP variants with up to 1000 nodes indicate that our proposed method consistently achieves superior performance on both uniform and real-world benchmarks, demonstrating robust generalization abilities.

Paper Structure

This paper contains 35 sections, 8 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Framework for Multi-Task Training via Knowledge Distillation. Top-left: seen and unseen tasks. Bottom-left: Independent training of teacher models. Right: Student model distillation via teacher output distributions.
  • Figure 2: Architecture of the proposed multi-task neural solver. Left to right: Encoder, Decoder, Node Padding & Masking.
  • Figure 3: Comparison of RRC and R3C methods. RRC increases the diversity of sampled subproblems by randomly reversing subtours, while R3C enhances this diversity by randomly reordering the external sequence of subtours.
  • Figure 4: Performance Comparison between Teacher and Student Models, both trained on instances of scale 100.
  • Figure 5: Impact of Different Components in R3C.