Self-Improved Learning for Scalable Neural Combinatorial Optimization
Fu Luo, Xi Lin, Zhenkun Wang, Xialiang Tong, Mingxuan Yuan, Qingfu Zhang
TL;DR
Self-Improved Learning (SIL) tackles the scalability bottleneck of end-to-end neural combinatorial optimization by enabling direct training on large-scale instances (up to 100K nodes) without labeled data. SIL combines a local reconstruction loop that generates pseudo-labels with model updates, and it relies on a linear-complexity attention mechanism to dramatically reduce computation and memory usage during decoding. The authors validate SIL on TSP and CVRP across uniform and real-world distributions, showing strong scalability and competitive performance, especially for large CVRP instances where SIL can outperform some classical solvers. While SIL does not consistently beat all classical baselines on every TSP task, it demonstrates substantial practical potential for scalable, data-driven CO solutions and highlights directions for further improvements in local reconstruction and training efficiency.
Abstract
The end-to-end neural combinatorial optimization (NCO) method shows promising performance in solving complex combinatorial optimization problems without the need for expert design. However, existing methods struggle with large-scale problems, hindering their practical applicability. To overcome this limitation, this work proposes a novel Self-Improved Learning (SIL) method for better scalability of neural combinatorial optimization. Specifically, we develop an efficient self-improved mechanism that enables direct model training on large-scale problem instances without any labeled data. Powered by an innovative local reconstruction approach, this method can iteratively generate better solutions by itself as pseudo-labels to guide efficient model training. In addition, we design a linear complexity attention mechanism for the model to efficiently handle large-scale combinatorial problem instances with low computation overhead. Comprehensive experiments on the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 100K nodes in both uniform and real-world distributions demonstrate the superior scalability of our method.
