VAGPO: Vision-augmented Asymmetric Group Preference Optimization for Graph Routing Problems
Shiyan Liu, Bohan Tan, Zhiguang Cao, Yan Jin
TL;DR
This work targets efficient graph routing for $TSP$ and $CVRP$ by introducing VAGPO, a vision-augmented Transformer framework that fuses image-based spatial representations with sequential decision modeling. A CNN-based vision encoder (ResNet-18) processes image-like problem instances, while a cross-modal Transformer encoder integrates visual features with sequential cues, feeding an autoregressive decoder to construct tours or routes. Training is enhanced by Asymmetric Group Preference Optimization (AGPO), which leverages grouped trajectory preferences and asymmetric weighting to improve stability and sample efficiency relative to policy-gradient baselines. Results on generated and real-world datasets show VAGPO achieves competitive or superior solution quality with significantly fewer training epochs and robust generalization up to $N=1000$ nodes, indicating strong potential for scalable, data-driven graph routing in networks.
Abstract
Graph routing problems play a vital role in web-related networks, where finding optimal paths across graphs is essential for efficient data transmission and content delivery. Classic routing formulations such as the Traveling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) represent fundamental graph optimization challenges. Recent data-driven optimization methods have made significant progress, yet they often face limitations in training efficiency and generalization to large-scale instances. In this paper, we propose a novel Vision-augmented Asymmetric Group Preference Optimization (VAGPO) approach. By leveraging ResNet-based visual encoding and Transformer-based sequential modeling, VAGPO captures both spatial structure and temporal dependencies. Furthermore, we introduce an asymmetric group preference optimization strategy that significantly accelerates convergence compared to commonly used policy gradient methods. Experimental results on generated TSP and CVRP instances, as well as real-world datasets, demonstrate that the proposed VAGPO approach achieves highly competitive solution quality. Additionally, VAGPO exhibits strong generalization to larger instances (up to 1000 nodes) without re-training, highlighting its effectiveness in both learning efficiency and scalability.
