Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems

Zengyu Zou; Jingyuan Wang; Yixuan Huang; Junjie Wu

Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems

Zengyu Zou, Jingyuan Wang, Yixuan Huang, Junjie Wu

TL;DR

This work tackles the dynamic MVDPDPSR by introducing MAPT, a centralized decision framework that uses a Transformer-based encoder to represent all entities and a pointer-based autoregressive decoder to generate joint actions. A Relation-Aware Attention module captures inter-entity relationships, while informative priors bias exploration toward high-reward actions, and PPO trains the model. Key contributions include formal MDP formulation, a RA attention mechanism, autoregressive joint-action decoding, and informative priors with strong empirical validation on eight datasets showing improved solution quality and faster runtimes compared to classical methods. The framework advances real-time, scalable routing for on-demand物流 by effectively handling stochastic requests and multi-vehicle coordination.

Abstract

This paper addresses the cooperative Multi-Vehicle Dynamic Pickup and Delivery Problem with Stochastic Requests (MVDPDPSR) and proposes an end-to-end centralized decision-making framework based on sequence-to-sequence, named Multi-Agent Pointer Transformer (MAPT). MVDPDPSR is an extension of the vehicle routing problem and a spatio-temporal system optimization problem, widely applied in scenarios such as on-demand delivery. Classical operations research methods face bottlenecks in computational complexity and time efficiency when handling large-scale dynamic problems. Although existing reinforcement learning methods have achieved some progress, they still encounter several challenges: 1) Independent decoding across multiple vehicles fails to model joint action distributions; 2) The feature extraction network struggles to capture inter-entity relationships; 3) The joint action space is exponentially large. To address these issues, we designed the MAPT framework, which employs a Transformer Encoder to extract entity representations, combines a Transformer Decoder with a Pointer Network to generate joint action sequences in an AutoRegressive manner, and introduces a Relation-Aware Attention module to capture inter-entity relationships. Additionally, we guide the model's decision-making using informative priors to facilitate effective exploration. Experiments on 8 datasets demonstrate that MAPT significantly outperforms existing baseline methods in terms of performance and exhibits substantial computational time advantages compared to classical operations research methods.

Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems

TL;DR

Abstract

Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)

Theorems & Definitions (7)