Table of Contents
Fetching ...

Self-Improved Learning for Scalable Neural Combinatorial Optimization

Fu Luo, Xi Lin, Zhenkun Wang, Xialiang Tong, Mingxuan Yuan, Qingfu Zhang

TL;DR

Self-Improved Learning (SIL) tackles the scalability bottleneck of end-to-end neural combinatorial optimization by enabling direct training on large-scale instances (up to 100K nodes) without labeled data. SIL combines a local reconstruction loop that generates pseudo-labels with model updates, and it relies on a linear-complexity attention mechanism to dramatically reduce computation and memory usage during decoding. The authors validate SIL on TSP and CVRP across uniform and real-world distributions, showing strong scalability and competitive performance, especially for large CVRP instances where SIL can outperform some classical solvers. While SIL does not consistently beat all classical baselines on every TSP task, it demonstrates substantial practical potential for scalable, data-driven CO solutions and highlights directions for further improvements in local reconstruction and training efficiency.

Abstract

The end-to-end neural combinatorial optimization (NCO) method shows promising performance in solving complex combinatorial optimization problems without the need for expert design. However, existing methods struggle with large-scale problems, hindering their practical applicability. To overcome this limitation, this work proposes a novel Self-Improved Learning (SIL) method for better scalability of neural combinatorial optimization. Specifically, we develop an efficient self-improved mechanism that enables direct model training on large-scale problem instances without any labeled data. Powered by an innovative local reconstruction approach, this method can iteratively generate better solutions by itself as pseudo-labels to guide efficient model training. In addition, we design a linear complexity attention mechanism for the model to efficiently handle large-scale combinatorial problem instances with low computation overhead. Comprehensive experiments on the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 100K nodes in both uniform and real-world distributions demonstrate the superior scalability of our method.

Self-Improved Learning for Scalable Neural Combinatorial Optimization

TL;DR

Self-Improved Learning (SIL) tackles the scalability bottleneck of end-to-end neural combinatorial optimization by enabling direct training on large-scale instances (up to 100K nodes) without labeled data. SIL combines a local reconstruction loop that generates pseudo-labels with model updates, and it relies on a linear-complexity attention mechanism to dramatically reduce computation and memory usage during decoding. The authors validate SIL on TSP and CVRP across uniform and real-world distributions, showing strong scalability and competitive performance, especially for large CVRP instances where SIL can outperform some classical solvers. While SIL does not consistently beat all classical baselines on every TSP task, it demonstrates substantial practical potential for scalable, data-driven CO solutions and highlights directions for further improvements in local reconstruction and training efficiency.

Abstract

The end-to-end neural combinatorial optimization (NCO) method shows promising performance in solving complex combinatorial optimization problems without the need for expert design. However, existing methods struggle with large-scale problems, hindering their practical applicability. To overcome this limitation, this work proposes a novel Self-Improved Learning (SIL) method for better scalability of neural combinatorial optimization. Specifically, we develop an efficient self-improved mechanism that enables direct model training on large-scale problem instances without any labeled data. Powered by an innovative local reconstruction approach, this method can iteratively generate better solutions by itself as pseudo-labels to guide efficient model training. In addition, we design a linear complexity attention mechanism for the model to efficiently handle large-scale combinatorial problem instances with low computation overhead. Comprehensive experiments on the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 100K nodes in both uniform and real-world distributions demonstrate the superior scalability of our method.
Paper Structure (51 sections, 22 equations, 8 figures, 8 tables)

This paper contains 51 sections, 22 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Development of constructive NCO methods.
  • Figure 2: Different Learning Methods for NCO.(a) Supervised Learning (SL) : SL-based method relies on high-quality solutions for model training. Its applicability to large-scale problems is restricted by the difficulty of obtaining high-quality solutions. (b) Reinforcement Learning (RL): RL-based method requires the generation of complete solutions to calculate rewards during model training. For large-scale problems, it is hindered by the issues of sparse rewards and high computational costs. (c) Self-Improved Learning (SIL, This Work): SIL has a novel iterative cycle that contains 1) a local reconstruction step to produce enhanced solutions for model training, and 2) a model training step to further strengthen the local reconstruction performance. In this way, SIL can tackle large-scale problem with up to 100K nodes.
  • Figure 3: The self-improved learning (SIL) process. It involves a cycle of iterative self-improvement. In each interaction, the model performs a local reconstruction to improve the solution quality. Then the enhanced solutions act as pseudo-labels for the model training to improve model performance.
  • Figure 4: Linear attention model design. The proposed linear attention mechanism utilizes a certain number of representative points to aggregate key information about the graph and broadcast this information to all nodes, which efficiently eliminates the need for explicit computations between all nodes during the attention process, thereby achieving linear complexity.
  • Figure 5: SIL training progress on CVRP with $100,000$ nodes.
  • ...and 3 more figures