Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization

Fu Luo; Xi Lin; Fei Liu; Qingfu Zhang; Zhenkun Wang

Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization

Fu Luo, Xi Lin, Fei Liu, Qingfu Zhang, Zhenkun Wang

TL;DR

This paper tackles the challenge of generalizing neural combinatorial optimization to large-scale problems by introducing LEHD, a Light Encoder and Heavy Decoder architecture that updates node relationships at every construction step. To enable practical training, it adopts a data-efficient supervised scheme that learns to reconstruct partial solutions, supplemented by a flexible Random Re-Construct (RRC) mechanism for online improvement. Experiments on TSP and CVRP show LEHD achieves strong generalization up to 1000 nodes and competitive results on TSPLib/CVRPLib, often surpassing other purely learning-based methods and approaching classic solvers with targeted inference budgets. The results suggest that dynamic, scale-aware decoding and partial-solution training can significantly boost the practicality of learning-based NCO for real-world large-scale problems.

Abstract

Neural combinatorial optimization (NCO) is a promising learning-based approach for solving challenging combinatorial optimization problems without specialized algorithm design by experts. However, most constructive NCO methods cannot solve problems with large-scale instance sizes, which significantly diminishes their usefulness for real-world applications. In this work, we propose a novel Light Encoder and Heavy Decoder (LEHD) model with a strong generalization ability to address this critical issue. The LEHD model can learn to dynamically capture the relationships between all available nodes of varying sizes, which is beneficial for model generalization to problems of various scales. Moreover, we develop a data-efficient training scheme and a flexible solution construction mechanism for the proposed LEHD model. By training on small-scale problem instances, the LEHD model can generate nearly optimal solutions for the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 1000 nodes, and also generalizes well to solve real-world TSPLib and CVRPLib problems. These results confirm our proposed LEHD model can significantly improve the state-of-the-art performance for constructive NCO. The code is available at https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD.

Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization

TL;DR

Abstract

Paper Structure (46 sections, 4 equations, 4 figures, 13 tables)

This paper contains 46 sections, 4 equations, 4 figures, 13 tables.

Introduction
Related Work
Constructive NCO with Balanced Encoder-Decoder
Constructive NCO with Heavy Encoder and Light Decoder
Non-Constructive NCO Methods
Model Architecture: Light Encoder and Heavy Decoder
Encoder
Attention layer
Decoder
Learn to Construct Partial Solution
Generate Partial Solutions during the Training Phase
Learn to Construct Partial Solutions via Supervised Learning
Generate the Complete Solution during the Inference Phase
Random Re-Construct for Further Improvement
Experiment
...and 31 more sections

Figures (4)

Figure 1: The structure of our proposed LEHD model, which has a single-layer light encoder and a heavy decoder with $L$ attention layers.
Figure 2: Examples of generating partial solution for TSP and CVRP. For the TSP instance, its optimal solution is [1,2,3,4,5,6,7,8,9], and a partial solution is randomly sampled as [6,5,4,3,2]. For the CVRP instance, its optimal solution is [0,1,2,3,0,4,5,6,0,7,8,9,0], and a partial solution [2,3,0,6,5,4,0] is randomly sampled. We impose a restriction for CVRP that the partial solution must end at the depot.
Figure 3: Instance pr2392 with 2392 nodes.
Figure 4: Instance X-n1001-k43 with 1000 nodes

Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization

TL;DR

Abstract

Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization

Authors

TL;DR

Abstract

Table of Contents

Figures (4)