Table of Contents
Fetching ...

GAMA: A Neural Neighborhood Search Method with Graph-aware Multi-modal Attention for Vehicle Routing Problem

Xiangling Chen, Yi Mei, Mengjie Zhang

TL;DR

The paper tackles CVRP with a Learning-to-Improve framework by introducing GAMA, a graph-aware multimodal attention encoder that jointly represents the problem instance and evolving solution. The method uses Dual-GCN streams to encode each modality, followed by stacked self- and cross-attention with a gated fusion mechanism to form a rich state for PPO-based operator selection. Empirical results on synthetic and CVRP benchmarks show that GAMA outperforms strong neural baselines and generalizes well to large-scale, out-of-distribution instances without retraining. This work advances neural VRP solvers by enabling deeper structural understanding and adaptive search control, yielding higher-quality solutions and more robust performance in complex routing scenarios.

Abstract

Recent advances in neural neighborhood search methods have shown potential in tackling Vehicle Routing Problems (VRPs). However, most existing approaches rely on simplistic state representations and fuse heterogeneous information via naive concatenation, limiting their ability to capture rich structural and semantic context. To address these limitations, we propose GAMA, a neural neighborhood search method with Graph-aware Multi-modal Attention model in VRP. GAMA encodes the problem instance and its evolving solution as distinct modalities using graph neural networks, and models their intra- and inter-modal interactions through stacked self- and cross-attention layers. A gated fusion mechanism further integrates the multi-modal representations into a structured state, enabling the policy to make informed and generalizable operator selection decisions. Extensive experiments conducted across various synthetic and benchmark instances demonstrate that the proposed algorithm GAMA significantly outperforms the recent neural baselines. Further ablation studies confirm that both the multi-modal attention mechanism and the gated fusion design play a key role in achieving the observed performance gains.

GAMA: A Neural Neighborhood Search Method with Graph-aware Multi-modal Attention for Vehicle Routing Problem

TL;DR

The paper tackles CVRP with a Learning-to-Improve framework by introducing GAMA, a graph-aware multimodal attention encoder that jointly represents the problem instance and evolving solution. The method uses Dual-GCN streams to encode each modality, followed by stacked self- and cross-attention with a gated fusion mechanism to form a rich state for PPO-based operator selection. Empirical results on synthetic and CVRP benchmarks show that GAMA outperforms strong neural baselines and generalizes well to large-scale, out-of-distribution instances without retraining. This work advances neural VRP solvers by enabling deeper structural understanding and adaptive search control, yielding higher-quality solutions and more robust performance in complex routing scenarios.

Abstract

Recent advances in neural neighborhood search methods have shown potential in tackling Vehicle Routing Problems (VRPs). However, most existing approaches rely on simplistic state representations and fuse heterogeneous information via naive concatenation, limiting their ability to capture rich structural and semantic context. To address these limitations, we propose GAMA, a neural neighborhood search method with Graph-aware Multi-modal Attention model in VRP. GAMA encodes the problem instance and its evolving solution as distinct modalities using graph neural networks, and models their intra- and inter-modal interactions through stacked self- and cross-attention layers. A gated fusion mechanism further integrates the multi-modal representations into a structured state, enabling the policy to make informed and generalizable operator selection decisions. Extensive experiments conducted across various synthetic and benchmark instances demonstrate that the proposed algorithm GAMA significantly outperforms the recent neural baselines. Further ablation studies confirm that both the multi-modal attention mechanism and the gated fusion design play a key role in achieving the observed performance gains.

Paper Structure

This paper contains 32 sections, 11 equations, 5 figures, 8 tables, 2 algorithms.

Figures (5)

  • Figure 1: Illustration of iteration step within the proposed GAMA method.
  • Figure 2: Solution quality distribution of GENIS, GAMA_NG, and GAMA under different inference budgets ($T=5$k, $10$k, $20$k) on CVRP50.
  • Figure 3: Illustration examples of two operators with different local optimal neighbors.
  • Figure 4: Convergence curves of GAMA and different L2I methods.
  • Figure 5: Performance comparison between different methods on VRP instances. Left: Computation time. Right: Performance gap to baseline.