Table of Contents
Fetching ...

INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer

Han Fang, Zhihao Song, Paul Weng, Yutong Ban

TL;DR

This work targets generalization gaps in neural routing solvers by diagnosing embedding aliasing and interference from irrelevant nodes. It introduces INViT, an invariant nested view Transformer that processes multiple localized views around the last visited node and uses a multi-view decoder to predict the next step, with a REINFORCE-based training regime augmented by rotations, reflections, and normalization. Empirical results on MSVDRP and public datasets show INViT achieves superior cross-size and cross-distribution generalization for TSP and CVRP, with ablations validating the roles of graph sparsification, invariance, and nested views. The approach offers fast, scalable inference and strong practical impact for large-scale routing tasks in logistics and related fields.

Abstract

Recently, deep reinforcement learning has shown promising results for learning fast heuristics to solve routing problems. Meanwhile, most of the solvers suffer from generalizing to an unseen distribution or distributions with different scales. To address this issue, we propose a novel architecture, called Invariant Nested View Transformer (INViT), which is designed to enforce a nested design together with invariant views inside the encoders to promote the generalizability of the learned solver. It applies a modified policy gradient algorithm enhanced with data augmentations. We demonstrate that the proposed INViT achieves a dominant generalization performance on both TSP and CVRP problems with various distributions and different problem scales.

INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer

TL;DR

This work targets generalization gaps in neural routing solvers by diagnosing embedding aliasing and interference from irrelevant nodes. It introduces INViT, an invariant nested view Transformer that processes multiple localized views around the last visited node and uses a multi-view decoder to predict the next step, with a REINFORCE-based training regime augmented by rotations, reflections, and normalization. Empirical results on MSVDRP and public datasets show INViT achieves superior cross-size and cross-distribution generalization for TSP and CVRP, with ablations validating the roles of graph sparsification, invariance, and nested views. The approach offers fast, scalable inference and strong practical impact for large-scale routing tasks in logistics and related fields.

Abstract

Recently, deep reinforcement learning has shown promising results for learning fast heuristics to solve routing problems. Meanwhile, most of the solvers suffer from generalizing to an unseen distribution or distributions with different scales. To address this issue, we propose a novel architecture, called Invariant Nested View Transformer (INViT), which is designed to enforce a nested design together with invariant views inside the encoders to promote the generalizability of the learned solver. It applies a modified policy gradient algorithm enhanced with data augmentations. We demonstrate that the proposed INViT achieves a dominant generalization performance on both TSP and CVRP problems with various distributions and different problem scales.
Paper Structure (49 sections, 9 equations, 7 figures, 4 tables)

This paper contains 49 sections, 9 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Our method INViT aggregates node information from multiple nested local views (marked as colored discs). Trained on small instances following uniform distributions, INViT can generalize to instances with larger sizes or/and different distributions.
  • Figure 2: Preliminary findings. (a) The histogram of attention score for farther nodes in attention-based encoders (trained on TSP/CVRP 100). (b) The histogram of the optimal solution in $k$-th nearest neighbors for different $k$. (c) The percentage overlap of optimal solutions between the original and augmented instances.
  • Figure 3: The overall architecture of INViT. The input state is extracted into multiple nested views, consisting of neighborhood nodes around the last visited node. Nodes located in the smallest view are potential candidates, and other nodes located in the view are called normal nodes. Each nested view is processed by a single-view encoder to obtain the embeddings for each node. Embeddings are then concatenated channel-wisely across different views. The decoder takes the embeddings of the last visited node and the first visited node (or depot) as the query, and the embeddings of the potential candidates as the key and the value. Lastly, the model samples a node to visit by the output probabilities. It updates the partial tour in an autoregressive manner until a complete tour is constructed.
  • Figure 4: Impact of the nodes outside neighbor groups for the encoder.
  • Figure 5: Statistics on the action choice of the optimal solution. It represents the distribution of the rank of the next node to visit from a node in a solution tour among the nearest neighbors of the latter. Best viewed in colors.
  • ...and 2 more figures