Table of Contents
Fetching ...

Efficient Training of Multi-task Neural Solver for Combinatorial Optimization

Chenguang Wang, Zhang-Hua Fu, Pinyan Lu, Tianshu Yu

TL;DR

This work presents a general, bandwidth-efficient training paradigm for a unified neural solver across multiple combinatorial optimization problems (COPs) by combining loss decomposition with a bandit-based task sampler. It introduces an intra-task influence matrix to quantify cross-task effects, derives theoretically grounded rewards for the multi-armed bandit, and demonstrates superior efficiency and performance on TSP, CVRP, OP, and KP across synthetic and real datasets (TSPLib, CVRPLib). The approach reveals principled task relationships, highlights mostly orthogonal subspaces across COP types, and achieves strong generalization with reduced training resources. The method is model-agnostic, open-source, and offers practical guidance for training large-scale solvers under resource constraints with potential broader applicability to multi-task learning contexts.

Abstract

Efficiently training a multi-task neural solver for various combinatorial optimization problems (COPs) has been less studied so far. Naive application of conventional multi-task learning approaches often falls short in delivering a high-quality, unified neural solver. This deficiency primarily stems from the significant computational demands and a lack of adequate consideration for the complexities inherent in COPs. In this paper, we propose a general and efficient training paradigm to deliver a unified combinatorial multi-task neural solver. To this end, we resort to the theoretical loss decomposition for multiple tasks under an encoder-decoder framework, which enables more efficient training via proper bandit task-sampling algorithms through an intra-task influence matrix. By employing theoretically grounded approximations, our method significantly enhances overall performance, regardless of whether it is within constrained training budgets, across equivalent training epochs, or in terms of generalization capabilities, when compared to conventional training schedules. On the real-world datasets of TSPLib and CVRPLib, our method also achieved the best results compared to single task learning and multi-task learning approaches. Additionally, the influence matrix provides empirical evidence supporting common practices in the field of learning to optimize, further substantiating the effectiveness of our approach. Our code is open-sourced and available at https://github.com/LOGO-CUHKSZ/MTL-COP.

Efficient Training of Multi-task Neural Solver for Combinatorial Optimization

TL;DR

This work presents a general, bandwidth-efficient training paradigm for a unified neural solver across multiple combinatorial optimization problems (COPs) by combining loss decomposition with a bandit-based task sampler. It introduces an intra-task influence matrix to quantify cross-task effects, derives theoretically grounded rewards for the multi-armed bandit, and demonstrates superior efficiency and performance on TSP, CVRP, OP, and KP across synthetic and real datasets (TSPLib, CVRPLib). The approach reveals principled task relationships, highlights mostly orthogonal subspaces across COP types, and achieves strong generalization with reduced training resources. The method is model-agnostic, open-source, and offers practical guidance for training large-scale solvers under resource constraints with potential broader applicability to multi-task learning contexts.

Abstract

Efficiently training a multi-task neural solver for various combinatorial optimization problems (COPs) has been less studied so far. Naive application of conventional multi-task learning approaches often falls short in delivering a high-quality, unified neural solver. This deficiency primarily stems from the significant computational demands and a lack of adequate consideration for the complexities inherent in COPs. In this paper, we propose a general and efficient training paradigm to deliver a unified combinatorial multi-task neural solver. To this end, we resort to the theoretical loss decomposition for multiple tasks under an encoder-decoder framework, which enables more efficient training via proper bandit task-sampling algorithms through an intra-task influence matrix. By employing theoretically grounded approximations, our method significantly enhances overall performance, regardless of whether it is within constrained training budgets, across equivalent training epochs, or in terms of generalization capabilities, when compared to conventional training schedules. On the real-world datasets of TSPLib and CVRPLib, our method also achieved the best results compared to single task learning and multi-task learning approaches. Additionally, the influence matrix provides empirical evidence supporting common practices in the field of learning to optimize, further substantiating the effectiveness of our approach. Our code is open-sourced and available at https://github.com/LOGO-CUHKSZ/MTL-COP.
Paper Structure (33 sections, 2 theorems, 18 equations, 12 figures, 12 tables)

This paper contains 33 sections, 2 theorems, 18 equations, 12 figures, 12 tables.

Key Result

Proposition 1

Using encoder-decoder framework with parameters $\Theta=\bigcup_{i=1}^{K}\Theta^i=\{\theta^{\text{share}}\}\bigcup\{\theta_i,i=1,2,...,K\}$ and updating parameters with standard gradient descent: $\Theta(t+1)=\Theta(t)-\eta_t \nabla {L}(\Theta(t)),$ where $\eta_t$ is the step size. Then the differen where $\nabla {L}(\Theta)$ means taking gradient w.r.t. $\Theta$ and $\nabla {L}_\theta(\Theta)$ me

Figures (12)

  • Figure 1: Pipeline of MAB for Solving COPs in view of MTL. We consider four types of COPs: TSP, CVRP, OP and KP, each with a corresponding header and decoder. The encoder, which is common to all COPs, is also included. For each time step, we utilize the MAB algorithm to select a specific task for training, such as CVRP-100 depicted in the figure. We then obtain the loss for the selected task, perform loss decomposition as detailed in Section \ref{['sec: loss decompos']}, and construct a reward using the methodology outlined in Section \ref{['sec: reward design and influ. graph']}. Finally, we utilize the reward to update the MAB algorithm.
  • Figure 2: Comparative analysis of MTL methods during training: The left graph shows the mean objective function for TSP and CVRP (with a lower-is-better criterion), and the right graph shows the same for OP and KP (with a higher-is-better criterion), demonstrating the superior performance of our proposed method under varying computational budgets.
  • Figure 3: This figure compares the generalization performance across different scales for TSP, CVRP, OP, and KP. The y-axis represents the average test optimality gap (%) on small, median and large scales for models trained on small (blue), median (orange), and large (green) scale single-task datasets. The red line denotes the results of our method after training for 1000 epochs.
  • Figure 4: The comparison results are obtained by training our model for 1000 epochs and STL models for 100 epochs each, amounting to a total of 1200 epochs.
  • Figure 4: This figure provides a visual representation of the mutual influence between tasks. The left-hand side displays the average influence matrix, as defined in equation \ref{['eq: avg influ mat']}, while the right-hand side illustrates the influence value throughout the training process.
  • ...and 7 more figures

Theorems & Definitions (4)

  • Proposition 1: Loss decomposition for GD
  • Proof 1: Proof of proposition \ref{['prop: loss decomp']}:
  • Theorem 1
  • Proof 2: Proof of theorem \ref{['thm: 1']}: