Efficient Training of Multi-task Neural Solver for Combinatorial Optimization
Chenguang Wang, Zhang-Hua Fu, Pinyan Lu, Tianshu Yu
TL;DR
This work presents a general, bandwidth-efficient training paradigm for a unified neural solver across multiple combinatorial optimization problems (COPs) by combining loss decomposition with a bandit-based task sampler. It introduces an intra-task influence matrix to quantify cross-task effects, derives theoretically grounded rewards for the multi-armed bandit, and demonstrates superior efficiency and performance on TSP, CVRP, OP, and KP across synthetic and real datasets (TSPLib, CVRPLib). The approach reveals principled task relationships, highlights mostly orthogonal subspaces across COP types, and achieves strong generalization with reduced training resources. The method is model-agnostic, open-source, and offers practical guidance for training large-scale solvers under resource constraints with potential broader applicability to multi-task learning contexts.
Abstract
Efficiently training a multi-task neural solver for various combinatorial optimization problems (COPs) has been less studied so far. Naive application of conventional multi-task learning approaches often falls short in delivering a high-quality, unified neural solver. This deficiency primarily stems from the significant computational demands and a lack of adequate consideration for the complexities inherent in COPs. In this paper, we propose a general and efficient training paradigm to deliver a unified combinatorial multi-task neural solver. To this end, we resort to the theoretical loss decomposition for multiple tasks under an encoder-decoder framework, which enables more efficient training via proper bandit task-sampling algorithms through an intra-task influence matrix. By employing theoretically grounded approximations, our method significantly enhances overall performance, regardless of whether it is within constrained training budgets, across equivalent training epochs, or in terms of generalization capabilities, when compared to conventional training schedules. On the real-world datasets of TSPLib and CVRPLib, our method also achieved the best results compared to single task learning and multi-task learning approaches. Additionally, the influence matrix provides empirical evidence supporting common practices in the field of learning to optimize, further substantiating the effectiveness of our approach. Our code is open-sourced and available at https://github.com/LOGO-CUHKSZ/MTL-COP.
