CARL: Congestion-Aware Reinforcement Learning for Imitation-based Perturbations in Mixed Traffic Control

Bibek Poudel; Weizi Li; Shuai Li

CARL: Congestion-Aware Reinforcement Learning for Imitation-based Perturbations in Mixed Traffic Control

Bibek Poudel, Weizi Li, Shuai Li

TL;DR

CARL tackles the sim-to-real gap in mixed-traffic validation by integrating a hybrid imitation-learning and probabilistic-sampling approach with congestion-aware reinforcement learning. It introduces two RV classes—safety and efficiency—driven by a neural congestion classifier and trained with Proximal Policy Optimization, operating on observations that include ego and leader dynamics plus predicted congestion. Empirical results on a Ring topology show the safety RV raises Time-to-Collision above the critical threshold $4$ s and reduces Deceleration Rate to Avoid a Crash by up to $80\%$, while the efficiency RV increases throughput by up to $49\%$ and maintains strong fuel economy. Overall, CARL demonstrates a practical, scalable path to improving safety and efficiency in mixed traffic scenarios, with plans for broader dynamics, city-scale control, and hardware validation.

Abstract

Human-driven vehicles (HVs) exhibit complex and diverse behaviors. Accurately modeling such behavior is crucial for validating Robot Vehicles (RVs) in simulation and realizing the potential of mixed traffic control. However, existing approaches like parameterized models and data-driven techniques struggle to capture the full complexity and diversity. To address this, in this work, we introduce CARL, a hybrid approach that combines imitation learning for close proximity car-following and probabilistic sampling for larger headways. We also propose two classes of RL-based RVs: a safety RV focused on maximizing safety and an efficiency RV focused on maximizing efficiency. Our experiments show that the safety RV increases Time-to-Collision above the critical 4-second threshold and reduces Deceleration Rate to Avoid a Crash by up to 80%, while the efficiency RV achieves improvements in throughput of up to 49%. These results demonstrate the effectiveness of CARL in enhancing both safety and efficiency in mixed traffic.

CARL: Congestion-Aware Reinforcement Learning for Imitation-based Perturbations in Mixed Traffic Control

TL;DR

s and reduces Deceleration Rate to Avoid a Crash by up to

, while the efficiency RV increases throughput by up to

and maintains strong fuel economy. Overall, CARL demonstrates a practical, scalable path to improving safety and efficiency in mixed traffic scenarios, with plans for broader dynamics, city-scale control, and hardware validation.

Abstract

Paper Structure (11 sections, 9 equations, 4 figures, 2 tables)

This paper contains 11 sections, 9 equations, 4 figures, 2 tables.

Introduction
Methodology
Data Processing and Intelligent Driver Model (IDM)
Heuristic-based Robot Vehicles
Model-based Robot Vehicles
RL-based Robot Vehicles
Congestion Classifier
Benchmarking RL Policies
Perturbations Via Imitation Learning and Sampling
Experiments
Conclusion and Future Work

Figures (4)

Figure 1: Instantaneous accelerations observed during car-following behaviors at densities $[70, 150]~veh/km$. TOP: Real-world data from the I-$24$ MOTION dataset reveals a distribution having long tails extending to $[-3, 3]~m/s^2$. BOTTOM: IDM (in simulation) produces accelerations mostly within $[-0.5, 0.5]~m/s^2$, indicating much 'timid' driving behaviors than the real world.
Figure 2: Input data labeling for the congestion classifier (sensing zone shown in blue). The congestion classifier takes as input (position, velocity) of all vehicles in the sensing zone and outputs the traffic condition based on patterns in space headway.
Figure 3: LEFT: Confusion Matrix of a trained congestion classifier in Ring on the validation set with the six classes abbreviated as: L='Leaving', F='Forming', FF='Free Flow', C='Congested', U='Undefined', and N='No Vehicle'. RIGHT: The results of applying K-means clustering with t-SNE on a subset of the training data of the congestion classifier. The clusters are spread out and distinct suggesting that the data is easily classifiable.
Figure 4: Average velocity profile of RL-based approaches at $5\%$ penetration under long-term application of real-world perturbations (for $30$ minutes from $1000~s$ to $2800~s$), averaged over $10$ simulation rollouts. The solid lines indicate average velocity and colored ranges indicate standard deviation across rollouts. During the application of perturbations, Our efficiency RV has the highest average velocity at $3.95~m/s$ contributing to more throughput whereas Wu has the highest standard deviation at $1.35~m/s$, indicating more sensitivity.

CARL: Congestion-Aware Reinforcement Learning for Imitation-based Perturbations in Mixed Traffic Control

TL;DR

Abstract

CARL: Congestion-Aware Reinforcement Learning for Imitation-based Perturbations in Mixed Traffic Control

Authors

TL;DR

Abstract

Table of Contents

Figures (4)