Table of Contents
Fetching ...

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

Zhuoli Yin, Yi Ding, Reem Khir, Hua Cai

TL;DR

ViTSP is proposed, a novel framework that leverages pre-trained vision language models (VLMs) to visually guide the solution process for large-scale TSPs, outperforming existing learning-based methods and offering a new perspective in hybridizing pre-trained generative models and operations research solvers in solving combinatorial optimization problems.

Abstract

Solving the Traveling Salesman Problem (TSP) is NP-hard yet fundamental for a wide range of real-world applications. Classical exact methods face challenges in scaling, and heuristic methods often require domain-specific parameter calibration. While learning-based approaches have shown promise, they suffer from poor generalization and limited scalability due to fixed training data. This work proposes ViTSP, a novel framework that leverages pre-trained vision language models (VLMs) to visually guide the solution process for large-scale TSPs. The VLMs function to identify promising small-scale subproblems from a visualized TSP instance, which are then efficiently optimized using an off-the-shelf solver to improve the global solution. ViTSP bypasses the dedicated model training at the user end while maintaining effectiveness across diverse instances. Experiments on real-world TSP instances ranging from 1k to 88k nodes demonstrate that ViTSP consistently achieves solutions with average optimality gaps of 0.24%, outperforming existing learning-based methods. Under the same runtime budget, it surpasses the best-performing heuristic solver, LKH-3, by reducing its gaps by 3.57% to 100%, particularly on very-large-scale instances with more than 10k nodes. Our framework offers a new perspective in hybridizing pre-trained generative models and operations research solvers in solving combinatorial optimization problems. The framework holds potential for integration into more complex real-world logistics systems. The code is available at https://github.itap.purdue.edu/uSMART/ViTSP_ICLR2026.

ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

TL;DR

ViTSP is proposed, a novel framework that leverages pre-trained vision language models (VLMs) to visually guide the solution process for large-scale TSPs, outperforming existing learning-based methods and offering a new perspective in hybridizing pre-trained generative models and operations research solvers in solving combinatorial optimization problems.

Abstract

Solving the Traveling Salesman Problem (TSP) is NP-hard yet fundamental for a wide range of real-world applications. Classical exact methods face challenges in scaling, and heuristic methods often require domain-specific parameter calibration. While learning-based approaches have shown promise, they suffer from poor generalization and limited scalability due to fixed training data. This work proposes ViTSP, a novel framework that leverages pre-trained vision language models (VLMs) to visually guide the solution process for large-scale TSPs. The VLMs function to identify promising small-scale subproblems from a visualized TSP instance, which are then efficiently optimized using an off-the-shelf solver to improve the global solution. ViTSP bypasses the dedicated model training at the user end while maintaining effectiveness across diverse instances. Experiments on real-world TSP instances ranging from 1k to 88k nodes demonstrate that ViTSP consistently achieves solutions with average optimality gaps of 0.24%, outperforming existing learning-based methods. Under the same runtime budget, it surpasses the best-performing heuristic solver, LKH-3, by reducing its gaps by 3.57% to 100%, particularly on very-large-scale instances with more than 10k nodes. Our framework offers a new perspective in hybridizing pre-trained generative models and operations research solvers in solving combinatorial optimization problems. The framework holds potential for integration into more complex real-world logistics systems. The code is available at https://github.itap.purdue.edu/uSMART/ViTSP_ICLR2026.

Paper Structure

This paper contains 37 sections, 2 equations, 7 figures, 7 tables, 2 algorithms.

Figures (7)

  • Figure 1: The vision-guided framework (ViTSP) for large-scale TSP, where pre-trained VLMs and off-the-shelf solvers are asynchronously coordinated to identify and optimize subproblems, respectively.
  • Figure 2: Optimality gaps over time on selected instances between ViTSP and LKH-3 (more RUNS).
  • Figure 3: Ablation studies of different selection policies on selected instances.
  • Figure 4: An example of the visual prompt to VLMs. In this example, nrw1379 is used. The tour is initialized by LKH-3.
  • Figure 5: Optimality gap reduction over time between ViTSP and LKH-3 (more RUNS).
  • ...and 2 more figures