Table of Contents
Fetching ...

Reinforcement Learning-based Sequential Route Recommendation for System-Optimal Traffic Assignment

Leizhen Wang, Peibo Duan, Cheng Lyu, Zhenliang Ma

TL;DR

The paper tackles whether sequential, personalized route recommendations can realize system-optimal (SO) traffic assignment. It reformulates static SO as an online Markov Decision Process and solves it with a single-agent deep Q-learning algorithm guided by the Method of Successive Averages (MSA), integrating classical assignment insights into the RL training loop. On the Braess paradox network, the method converges to the theoretical SO, while on the larger Ortúzar–Willumsen network it achieves a 0.35% deviation from SO-MSA, with ablations showing the action set design critically affects convergence speed and final performance. Overall, the work provides a theoretically grounded, learning-based pathway to align individual routing behavior with system-level efficiency in online routing settings, offering a practical framework for sequential route guidance with analytical validation.

Abstract

Modern navigation systems and shared mobility platforms increasingly rely on personalized route recommendations to improve individual travel experience and operational efficiency. However, a key question remains: can such sequential, personalized routing decisions collectively lead to system-optimal (SO) traffic assignment? This paper addresses this question by proposing a learning-based framework that reformulates the static SO traffic assignment problem as a single-agent deep reinforcement learning (RL) task. A central agent sequentially recommends routes to travelers as origin-destination (OD) demands arrive, to minimize total system travel time. To enhance learning efficiency and solution quality, we develop an MSA-guided deep Q-learning algorithm that integrates the iterative structure of traditional traffic assignment methods into the RL training process. The proposed approach is evaluated on both the Braess and Ortuzar-Willumsen (OW) networks. Results show that the RL agent converges to the theoretical SO solution in the Braess network and achieves only a 0.35% deviation in the OW network. Further ablation studies demonstrate that the route action set's design significantly impacts convergence speed and final performance, with SO-informed route sets leading to faster learning and better outcomes. This work provides a theoretically grounded and practically relevant approach to bridging individual routing behavior with system-level efficiency through learning-based sequential assignment.

Reinforcement Learning-based Sequential Route Recommendation for System-Optimal Traffic Assignment

TL;DR

The paper tackles whether sequential, personalized route recommendations can realize system-optimal (SO) traffic assignment. It reformulates static SO as an online Markov Decision Process and solves it with a single-agent deep Q-learning algorithm guided by the Method of Successive Averages (MSA), integrating classical assignment insights into the RL training loop. On the Braess paradox network, the method converges to the theoretical SO, while on the larger Ortúzar–Willumsen network it achieves a 0.35% deviation from SO-MSA, with ablations showing the action set design critically affects convergence speed and final performance. Overall, the work provides a theoretically grounded, learning-based pathway to align individual routing behavior with system-level efficiency in online routing settings, offering a practical framework for sequential route guidance with analytical validation.

Abstract

Modern navigation systems and shared mobility platforms increasingly rely on personalized route recommendations to improve individual travel experience and operational efficiency. However, a key question remains: can such sequential, personalized routing decisions collectively lead to system-optimal (SO) traffic assignment? This paper addresses this question by proposing a learning-based framework that reformulates the static SO traffic assignment problem as a single-agent deep reinforcement learning (RL) task. A central agent sequentially recommends routes to travelers as origin-destination (OD) demands arrive, to minimize total system travel time. To enhance learning efficiency and solution quality, we develop an MSA-guided deep Q-learning algorithm that integrates the iterative structure of traditional traffic assignment methods into the RL training process. The proposed approach is evaluated on both the Braess and Ortuzar-Willumsen (OW) networks. Results show that the RL agent converges to the theoretical SO solution in the Braess network and achieves only a 0.35% deviation in the OW network. Further ablation studies demonstrate that the route action set's design significantly impacts convergence speed and final performance, with SO-informed route sets leading to faster learning and better outcomes. This work provides a theoretically grounded and practically relevant approach to bridging individual routing behavior with system-level efficiency through learning-based sequential assignment.

Paper Structure

This paper contains 16 sections, 11 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: RL framework
  • Figure 2: Network of the Braess Paradox
  • Figure 3: The training curve of the RL assignment model
  • Figure 4: OW network
  • Figure 5: The training curve of all RL-based assignment models