Reinforcement learning-based adaptive speed controllers in mixed autonomy condition

Han Wang; Hossein Nick Zinat Matin; Maria Laura Delle Monache

Reinforcement learning-based adaptive speed controllers in mixed autonomy condition

Han Wang, Hossein Nick Zinat Matin, Maria Laura Delle Monache

TL;DR

This work tackles congestion in mixed autonomy traffic by modeling the system with a PDE-ODE framework where density $\rho$ obeys $\rho_t + \partial_x f(\rho)=0$ and AV dynamics are captured by an ODE. An RL-based adaptive speed controller is learned via an Actor-Critic policy that interacts with the PDE-ODE to regulate AV speed in real time. Key findings include a 15% improvement in minimum flux, a 17% increase in average global speed, a 35% reduction in speed variation, and a 16% improvement in PPO-normalized reward, demonstrating effective shockwave mitigation and flow stabilization. The approach offers a practical path toward leveraging AVs for traffic efficiency, with future work addressing scalability, real-data integration, safety, and socio-economic considerations.

Abstract

The integration of Automated Vehicles (AVs) into traffic flow holds the potential to significantly improve traffic congestion by enabling AVs to function as actuators within the flow. This paper introduces an adaptive speed controller tailored for scenarios of mixed autonomy, where AVs interact with human-driven vehicles. We model the traffic dynamics using a system of strongly coupled Partial and Ordinary Differential Equations (PDE-ODE), with the PDE capturing the general flow of human-driven traffic and the ODE characterizing the trajectory of the AVs. A speed policy for AVs is derived using a Reinforcement Learning (RL) algorithm structured within an Actor-Critic (AC) framework. This algorithm interacts with the PDE-ODE model to optimize the AV control policy. Numerical simulations are presented to demonstrate the controller's impact on traffic patterns, showing the potential of AVs to improve traffic flow and reduce congestion.

Reinforcement learning-based adaptive speed controllers in mixed autonomy condition

TL;DR

This work tackles congestion in mixed autonomy traffic by modeling the system with a PDE-ODE framework where density

obeys

and AV dynamics are captured by an ODE. An RL-based adaptive speed controller is learned via an Actor-Critic policy that interacts with the PDE-ODE to regulate AV speed in real time. Key findings include a 15% improvement in minimum flux, a 17% increase in average global speed, a 35% reduction in speed variation, and a 16% improvement in PPO-normalized reward, demonstrating effective shockwave mitigation and flow stabilization. The approach offers a practical path toward leveraging AVs for traffic efficiency, with future work addressing scalability, real-data integration, safety, and socio-economic considerations.

Abstract

Paper Structure (7 sections, 1 theorem, 9 equations, 8 figures, 1 table)

This paper contains 7 sections, 1 theorem, 9 equations, 8 figures, 1 table.

INTRODUCTION
Mathematical Model
Controller Design
Markov Decision Process Formulation
Policy Parameter Update
Numerical Results
Conclusion

Key Result

Theorem 2.1

Let $\rho_\circ \in \mathbf{BV}(\mathbb{R};[0 , \rho_{\max}])$, where $\mathbf{BV}$ represents the space of bounded variations. Then, the Cauchy problem E:conservation-E:initial has a weak entropy solution $(\rho, y) \in C(\mathbb{R}_+; L^1 \cap \mathbf{BV}(\mathbb{R}; [0, \rho_{\text{max}}])) \time

Figures (8)

Figure 1: Illustration of the FD and the locations of $\check \rho(V)$ and $\hat{\rho}(V)$ for each $V \in [0, V_{\max}]$. The solutions above the line $F_\alpha(V) + V \rho$ do not satisfy flux constraint \ref{['E:flux_constraint']}.
Figure 2: Control loop of the proposed RL-based Adaptive Controller.
Figure 3: Benchmark scenario: Stop and go waves.
Figure 4: Learning curve of the PPO algorithm. 16% of normalized reward improvement observed in evaluation.
Figure 5: The reward weights of the model are $w_1,w_2,w_3=0.2,0.3,0.5$. The controlled vehicle tends to create the low-density area to neutralize the backward propagation of the shockwave.
...and 3 more figures

Theorems & Definitions (1)

Theorem 2.1

Reinforcement learning-based adaptive speed controllers in mixed autonomy condition

TL;DR

Abstract

Reinforcement learning-based adaptive speed controllers in mixed autonomy condition

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (1)