Generalized Policy Gradient with History-Aware Decision Transformer for Path Planning
Xing Wei, Duoxiang Zhao, Zezhou Zhang, Yuqi Ouyang, Hao Qin
TL;DR
This work addresses reliable path planning under stochastic traffic by introducing Generalized Policy Gradient with History-Aware Decision Transformer (GPG-HT). The approach combines a history-aware transformer policy with a generalized policy gradient objective to exploit non-Markovian temporal dependencies in routing decisions, trained via Monte Carlo simulations. Empirical results on the Sioux Falls and Anaheim networks show consistent improvements in on-time arrival probabilities across multiple time budgets, outperforming both dynamic programming and traditional RL baselines. The method demonstrates robust performance and scalability for real-world stochastic routing, with potential impact on adaptive, reliable navigation in urban transportation systems, especially where historical context influences travel-time distributions.
Abstract
With the rapidly increased number of vehicles in urban areas, existing road infrastructure struggles to accommodate modern traffic demands, resulting in congestion. This highlights the importance of efficient path planning strategies. Most recent navigation models focus on deterministic or time-dependent networks, overlooking correlations and the stochastic nature of traffic flows. In this work, we address the reliable shortest path problem in stochastic transportation networks and propose a path planning solution integrating the decision Transformer with the Generalized Policy Gradient (GPG) framework. Leveraging the Transformer's ability to model long-term dependencies, our solution improves path decision accuracy and stability. Experiments on the Sioux Falls (SFN) and large Anaheim (AN) networks show consistent improvement in on-time arrival probabilities by capturing non-Markovian dependencies in historical routing decisions on real-world topologies.
