Table of Contents
Fetching ...

Generalized Policy Gradient with History-Aware Decision Transformer for Path Planning

Xing Wei, Duoxiang Zhao, Zezhou Zhang, Yuqi Ouyang, Hao Qin

TL;DR

This work addresses reliable path planning under stochastic traffic by introducing Generalized Policy Gradient with History-Aware Decision Transformer (GPG-HT). The approach combines a history-aware transformer policy with a generalized policy gradient objective to exploit non-Markovian temporal dependencies in routing decisions, trained via Monte Carlo simulations. Empirical results on the Sioux Falls and Anaheim networks show consistent improvements in on-time arrival probabilities across multiple time budgets, outperforming both dynamic programming and traditional RL baselines. The method demonstrates robust performance and scalability for real-world stochastic routing, with potential impact on adaptive, reliable navigation in urban transportation systems, especially where historical context influences travel-time distributions.

Abstract

With the rapidly increased number of vehicles in urban areas, existing road infrastructure struggles to accommodate modern traffic demands, resulting in congestion. This highlights the importance of efficient path planning strategies. Most recent navigation models focus on deterministic or time-dependent networks, overlooking correlations and the stochastic nature of traffic flows. In this work, we address the reliable shortest path problem in stochastic transportation networks and propose a path planning solution integrating the decision Transformer with the Generalized Policy Gradient (GPG) framework. Leveraging the Transformer's ability to model long-term dependencies, our solution improves path decision accuracy and stability. Experiments on the Sioux Falls (SFN) and large Anaheim (AN) networks show consistent improvement in on-time arrival probabilities by capturing non-Markovian dependencies in historical routing decisions on real-world topologies.

Generalized Policy Gradient with History-Aware Decision Transformer for Path Planning

TL;DR

This work addresses reliable path planning under stochastic traffic by introducing Generalized Policy Gradient with History-Aware Decision Transformer (GPG-HT). The approach combines a history-aware transformer policy with a generalized policy gradient objective to exploit non-Markovian temporal dependencies in routing decisions, trained via Monte Carlo simulations. Empirical results on the Sioux Falls and Anaheim networks show consistent improvements in on-time arrival probabilities across multiple time budgets, outperforming both dynamic programming and traditional RL baselines. The method demonstrates robust performance and scalability for real-world stochastic routing, with potential impact on adaptive, reliable navigation in urban transportation systems, especially where historical context influences travel-time distributions.

Abstract

With the rapidly increased number of vehicles in urban areas, existing road infrastructure struggles to accommodate modern traffic demands, resulting in congestion. This highlights the importance of efficient path planning strategies. Most recent navigation models focus on deterministic or time-dependent networks, overlooking correlations and the stochastic nature of traffic flows. In this work, we address the reliable shortest path problem in stochastic transportation networks and propose a path planning solution integrating the decision Transformer with the Generalized Policy Gradient (GPG) framework. Leveraging the Transformer's ability to model long-term dependencies, our solution improves path decision accuracy and stability. Experiments on the Sioux Falls (SFN) and large Anaheim (AN) networks show consistent improvement in on-time arrival probabilities by capturing non-Markovian dependencies in historical routing decisions on real-world topologies.

Paper Structure

This paper contains 13 sections, 19 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Our model architecture. $i$ for decision step, $l$ for edges in current path, $L$ for total edges, $d$ for embedding dimension.
  • Figure 2: A travel example with OD pair 2-15 on the SFN dataset, the time budget is set to $T=0.95$. Each arrow is marked with path travel time.

Theorems & Definitions (1)

  • proof