ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

Zhuoli Yin; Yi Ding; Reem Khir; Hua Cai

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

Zhuoli Yin, Yi Ding, Reem Khir, Hua Cai

TL;DR

ViTSP is proposed, a novel framework that leverages pre-trained vision language models (VLMs) to visually guide the solution process for large-scale TSPs, outperforming existing learning-based methods and offering a new perspective in hybridizing pre-trained generative models and operations research solvers in solving combinatorial optimization problems.

Abstract

Solving the Traveling Salesman Problem (TSP) is NP-hard yet fundamental for a wide range of real-world applications. Classical exact methods face challenges in scaling, and heuristic methods often require domain-specific parameter calibration. While learning-based approaches have shown promise, they suffer from poor generalization and limited scalability due to fixed training data. This work proposes ViTSP, a novel framework that leverages pre-trained vision language models (VLMs) to visually guide the solution process for large-scale TSPs. The VLMs function to identify promising small-scale subproblems from a visualized TSP instance, which are then efficiently optimized using an off-the-shelf solver to improve the global solution. ViTSP bypasses the dedicated model training at the user end while maintaining effectiveness across diverse instances. Experiments on real-world TSP instances ranging from 1k to 88k nodes demonstrate that ViTSP consistently achieves solutions with average optimality gaps of 0.24%, outperforming existing learning-based methods. Under the same runtime budget, it surpasses the best-performing heuristic solver, LKH-3, by reducing its gaps by 3.57% to 100%, particularly on very-large-scale instances with more than 10k nodes. Our framework offers a new perspective in hybridizing pre-trained generative models and operations research solvers in solving combinatorial optimization problems. The framework holds potential for integration into more complex real-world logistics systems. The code is available at https://github.itap.purdue.edu/uSMART/ViTSP_ICLR2026.

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

TL;DR

Abstract

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)