Table of Contents
Fetching ...

Deep Reinforcement Learning for Day-to-day Dynamic Tolling in Tradable Credit Schemes

Xiaoyi Wu, Ravi Seshadri, Filipe Rodrigues, Carlos Lima Azevedo

TL;DR

The paper addresses day-to-day dynamic tolling within Tradable Credit Schemes by formulating the problem as a finite-horizon Markov Decision Process and solving with Proximal Policy Optimization. It demonstrates that deep RL can achieve travel times and social welfare comparable to Bayesian optimization while generalizing across unseen capacity and demand conditions, and it shows that action-smoothing regularization (CAPS) mitigates policy oscillations to enhance practical deployability. The work also investigates transfer learning to improve computational efficiency and robustness to day-to-day variability, and it discusses scalability challenges and potential extensions to multi-modal networks. Overall, the approach offers a viable, transferable framework for dynamic, revenue-neutral tolling that can adapt to changing traffic conditions and broader transportation systems.

Abstract

Tradable credit schemes (TCS) are an increasingly studied alternative to congestion pricing, given their revenue neutrality and ability to address issues of equity through the initial credit allocation. Modeling TCS to aid future design and implementation is associated with challenges involving user and market behaviors, demand-supply dynamics, and control mechanisms. In this paper, we focus on the latter and address the day-to-day dynamic tolling problem under TCS, which is formulated as a discrete-time Markov Decision Process and solved using reinforcement learning (RL) algorithms. Our results indicate that RL algorithms achieve travel times and social welfare comparable to the Bayesian optimization benchmark, with generalization across varying capacities and demand levels. We further assess the robustness of RL under different hyperparameters and apply regularization techniques to mitigate action oscillation, which generates practical tolling strategies that are transferable under day-to-day demand and supply variability. Finally, we discuss potential challenges such as scaling to large networks, and show how transfer learning can be leveraged to improve computational efficiency and facilitate the practical deployment of RL-based TCS solutions.

Deep Reinforcement Learning for Day-to-day Dynamic Tolling in Tradable Credit Schemes

TL;DR

The paper addresses day-to-day dynamic tolling within Tradable Credit Schemes by formulating the problem as a finite-horizon Markov Decision Process and solving with Proximal Policy Optimization. It demonstrates that deep RL can achieve travel times and social welfare comparable to Bayesian optimization while generalizing across unseen capacity and demand conditions, and it shows that action-smoothing regularization (CAPS) mitigates policy oscillations to enhance practical deployability. The work also investigates transfer learning to improve computational efficiency and robustness to day-to-day variability, and it discusses scalability challenges and potential extensions to multi-modal networks. Overall, the approach offers a viable, transferable framework for dynamic, revenue-neutral tolling that can adapt to changing traffic conditions and broader transportation systems.

Abstract

Tradable credit schemes (TCS) are an increasingly studied alternative to congestion pricing, given their revenue neutrality and ability to address issues of equity through the initial credit allocation. Modeling TCS to aid future design and implementation is associated with challenges involving user and market behaviors, demand-supply dynamics, and control mechanisms. In this paper, we focus on the latter and address the day-to-day dynamic tolling problem under TCS, which is formulated as a discrete-time Markov Decision Process and solved using reinforcement learning (RL) algorithms. Our results indicate that RL algorithms achieve travel times and social welfare comparable to the Bayesian optimization benchmark, with generalization across varying capacities and demand levels. We further assess the robustness of RL under different hyperparameters and apply regularization techniques to mitigate action oscillation, which generates practical tolling strategies that are transferable under day-to-day demand and supply variability. Finally, we discuss potential challenges such as scaling to large networks, and show how transfer learning can be leveraged to improve computational efficiency and facilitate the practical deployment of RL-based TCS solutions.

Paper Structure

This paper contains 29 sections, 21 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Proposed RL framework
  • Figure 2: Benchmark comparison for one dimensional action space.
  • Figure 3: Comparison of performance between transferred and learn-from-scratch policies under varying levels of congestion and demand. Both policies exhibit similar performance in terms of AITT and rewards. However, policies trained in highly congested environments adopt more aggressive tolling strategies, with higher peak toll values and faster adjustments in tolls. These strategies result in a greater PT mode share.
  • Figure 4: Impact of Batch Size on PPO Performance. Overly small batch sizes (e.g., 480) tend to converge quickly to suboptimal, oscillatory policies with highly variable tolling rates, slightly higher AITT, and lower rewards. In contrast, larger batch sizes (e.g., 1920) exhibit greater data diversity, as reflected in large shaded areas for AITT, tolling rates, and rewards. However, they may result in slower convergence and much lower rewards compared to other batch sizes under the same training budget.
  • Figure 5: Impact of Epoch Number on PPO Performance. In general, increasing the number of epochs allows the RL algorithm to learn more thoroughly from the training data. However, a high epoch number (e.g., 32) combined with a moderate batch size results in oscillatory sub-optimal policies. Balancing the number of epochs and batch size is crucial to ensure stable and robust policy learning.
  • ...and 1 more figures