TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem

Daniel Fuertes; Carlos R. del-Blanco; Fernando Jaureguizar; Narciso García

TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem

Daniel Fuertes, Carlos R. del-Blanco, Fernando Jaureguizar, Narciso García

TL;DR

This paper tackles the Team Orienteering Problem (TOP) by introducing TOP-Former, a centralized Transformer-based neural network that encodes the entire graph and the full fleet state to generate cooperative routes for multiple agents under a time limit $T$. Trained with deep reinforcement learning, TOP-Former employs an encoder-decoder architecture that computes a global graph embedding and autoregressively predicts multi-agent paths, ensuring feasibility via masking and a node-blocking strategy. Empirical results on synthetic TOP instances and the VDRPMDPC dataset show that TOP-Former delivers high-quality solutions with substantially faster inference times than state-of-the-art linear programming, heuristic, and neural approaches, supporting real-time decision-making in ITS and package delivery. The work demonstrates the value of global context and centralized attention for multi-agent VRPs, while acknowledging scalability challenges and outlining future directions toward decentralized or memory-efficient architectures for very large-scale problems.

Abstract

Route planning for a fleet of vehicles is an important task in applications such as package delivery, surveillance, or transportation, often integrated within larger Intelligent Transportation Systems (ITS). This problem is commonly formulated as a Vehicle Routing Problem (VRP) known as the Team Orienteering Problem (TOP). Existing solvers for this problem primarily rely on either linear programming, which provides accurate solutions but requires computation times that grow with the size of the problem, or heuristic methods, which typically find suboptimal solutions in a shorter time. In this paper, we introduce TOP-Former, a multi-agent route planning neural network designed to efficiently and accurately solve the Team Orienteering Problem. The proposed algorithm is based on a centralized Transformer neural network capable of learning to encode the scenario (modeled as a graph) and analyze the complete context of all agents to deliver fast, precise, and collaborative solutions. Unlike other neural network-based approaches that adopt a more local perspective, TOP-Former is trained to understand the global situation of the vehicle fleet and generate solutions that maximize long-term expected returns. Extensive experiments demonstrate that the presented system outperforms most state-of-the-art methods in terms of both accuracy and computation speed.

TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem

TL;DR

. Trained with deep reinforcement learning, TOP-Former employs an encoder-decoder architecture that computes a global graph embedding and autoregressively predicts multi-agent paths, ensuring feasibility via masking and a node-blocking strategy. Empirical results on synthetic TOP instances and the VDRPMDPC dataset show that TOP-Former delivers high-quality solutions with substantially faster inference times than state-of-the-art linear programming, heuristic, and neural approaches, supporting real-time decision-making in ITS and package delivery. The work demonstrates the value of global context and centralized attention for multi-agent VRPs, while acknowledging scalability challenges and outlining future directions toward decentralized or memory-efficient architectures for very large-scale problems.

Abstract

Paper Structure (13 sections, 13 equations, 8 figures, 5 tables)

This paper contains 13 sections, 13 equations, 8 figures, 5 tables.

Introduction
Related works
Problem formulation
System description
Training Strategy
Encoder
Decoder
Node Selection
Results
Experimental setup
Results and discussions
Open issues
Conclusion

Figures (8)

Figure 1: Example of a TOP instance, containing: (a) a TOP scenario with a depot (red circle) and a set of regions (black circles); and (b) a solution for three agents (colored arrows). Notice that the TOP imposes a time limit to return to the end depot. Thus, it is not mandatory to visit all the regions.
Figure 2: Training scheme of the proposed TOP-Former, composed of five main blocks: TOP Instance Generator, TOP Simulator, TOP-Former, Node Selection, and Loss Function.
Figure 3: Structure of TOP-Former, consisting of an Encoder and a Decoder that generate a policy $\pi_{\theta}$ from a given scenario $\alpha$. The final Node Selection module samples the routes $\rho^1, ..., \rho^m$.
Figure 4: Structure of the Input Embedding module, which generates an initial graph representation $h^{input}$.
Figure 5: Multi-Head Attention block, where embeddings $V$, $K$, and $Q$ undergo an attention mechanism, followed by concatenation and a fully connected layer.
...and 3 more figures

TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem

TL;DR

Abstract

TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem

Authors

TL;DR

Abstract

Table of Contents

Figures (8)