Table of Contents
Fetching ...

Delay-Aware Reinforcement Learning for Highway On-Ramp Merging under Stochastic Communication Latency

Amin Tabrizian, Zhitong Huang, Arsyi Aziz, Peng Wei

TL;DR

DAROM tackles reinforcement learning for highway on-ramp merging under stochastic V2I latency by modeling the problem as a Random Delay MDP (RDMDP) and introducing a Delay-Aware Encoder that infers current latent states from delayed observations and action histories. A unified SAC-based agent governs both longitudinal and lateral control, while a physics-based safety controller provides a guaranteed safety layer during merging. Empirical results in SUMO with real-NGSIM traffic show DAROM, especially the GRU-based variant, achieving near-perfect success (up to $99.8\%$) under delays up to $2.0$ seconds and outperforming MPC and other DRL baselines, highlighting robustness to observation latency. The work advances infrastructure-assisted perception by explicitly accounting for stochastic latency and offers a practical, safe control framework for connected-road scenarios, though real-world validation and more realistic perception models remain for future work.

Abstract

Delayed and partially observable state information poses significant challenges for reinforcement learning (RL)-based control in real-world autonomous driving. In highway on-ramp merging, a roadside unit (RSU) can sense nearby traffic, perform edge perception, and transmit state estimates to the ego vehicle over vehicle-to-infrastructure (V2I) links. With recent advancements in intelligent transportation infrastructure and edge computing, such RSU-assisted perception is increasingly realistic and already deployed in modern connected roadway systems. However, edge processing time and wireless transmission can introduce stochastic V2I communication delays, violating the Markov assumption and substantially degrading control performance. In this work, we propose DAROM, a Delay-Aware Reinforcement Learning framework for On-ramp Merging that is robust to stochastic delays. We model the problem as a random delay Markov decision process (RDMDP) and develop a unified RL agent for joint longitudinal and lateral control. To recover a Markovian representation under delayed observations, we introduce a Delay-Aware Encoder that conditions on delayed observations, masked action histories, and observed delay magnitude to infer the current latent state. We further integrate a physics-based safety controller to reduce collision risk during merging. Experiments in the Simulation of Urban MObility (SUMO) simulator using real-world traffic data from the Next Generation Simulation (NGSIM) dataset demonstrate that DAROM consistently outperforms standard RL baselines across traffic densities. In particular, the gated recurrent unit (GRU)-based encoder achieves over 99% success in high-density traffic with random V2I delays of up to 2.0 seconds.

Delay-Aware Reinforcement Learning for Highway On-Ramp Merging under Stochastic Communication Latency

TL;DR

DAROM tackles reinforcement learning for highway on-ramp merging under stochastic V2I latency by modeling the problem as a Random Delay MDP (RDMDP) and introducing a Delay-Aware Encoder that infers current latent states from delayed observations and action histories. A unified SAC-based agent governs both longitudinal and lateral control, while a physics-based safety controller provides a guaranteed safety layer during merging. Empirical results in SUMO with real-NGSIM traffic show DAROM, especially the GRU-based variant, achieving near-perfect success (up to ) under delays up to seconds and outperforming MPC and other DRL baselines, highlighting robustness to observation latency. The work advances infrastructure-assisted perception by explicitly accounting for stochastic latency and offers a practical, safe control framework for connected-road scenarios, though real-world validation and more realistic perception models remain for future work.

Abstract

Delayed and partially observable state information poses significant challenges for reinforcement learning (RL)-based control in real-world autonomous driving. In highway on-ramp merging, a roadside unit (RSU) can sense nearby traffic, perform edge perception, and transmit state estimates to the ego vehicle over vehicle-to-infrastructure (V2I) links. With recent advancements in intelligent transportation infrastructure and edge computing, such RSU-assisted perception is increasingly realistic and already deployed in modern connected roadway systems. However, edge processing time and wireless transmission can introduce stochastic V2I communication delays, violating the Markov assumption and substantially degrading control performance. In this work, we propose DAROM, a Delay-Aware Reinforcement Learning framework for On-ramp Merging that is robust to stochastic delays. We model the problem as a random delay Markov decision process (RDMDP) and develop a unified RL agent for joint longitudinal and lateral control. To recover a Markovian representation under delayed observations, we introduce a Delay-Aware Encoder that conditions on delayed observations, masked action histories, and observed delay magnitude to infer the current latent state. We further integrate a physics-based safety controller to reduce collision risk during merging. Experiments in the Simulation of Urban MObility (SUMO) simulator using real-world traffic data from the Next Generation Simulation (NGSIM) dataset demonstrate that DAROM consistently outperforms standard RL baselines across traffic densities. In particular, the gated recurrent unit (GRU)-based encoder achieves over 99% success in high-density traffic with random V2I delays of up to 2.0 seconds.
Paper Structure (32 sections, 10 equations, 5 figures, 4 tables)

This paper contains 32 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Highway on-ramp merging scenario. The RSU unit will compute the surrounding vehicles' information and communicates it to the ego vehicle (red) with stochastic latency.
  • Figure 2: Overview of the DAROM framework. To address random observation delays, the system augments the delayed observation $o_t$ with the masked action history $u_{t-\omega_t:t-1}$ and delay magnitude $\omega_t$. These inputs are fused into a latent representation $z_t$ by the Delay-Aware Encoder. A unified soft actor-critic (SAC) agent then generates a raw control action $a_t$, which is validated and potentially overridden by the Safety Controller ($a_{t_\text{safe}}$) to ensure collision-free merging.
  • Figure 3: Augmented state construction details. At time-step $t=6$, the state $s_6$ has a delay of $d_6 = 2$, and the received observation $o_6 = s_3$ has a delay of $\omega_6 = 3$. The ego vehicle still has not observed $s_4$ and $s_5$ because of their delay amounts. $x_6$ represents the augmented state which includes delayed observation, action buffer, and the delay magnitude for the received observation.
  • Figure 4: Overall comparison across Easy, Medium, and Hard modes. Rows correspond to average return, critic, and actor loss, respectively.
  • Figure 5: Ablation study in Hard mode. Comparing the Full DAROM-GRU architecture against ablated inputs: delayed state, delayed state with delay magnitude, and delayed state with action buffer. Two of the ablated versions failed to converge while the Full version demonstrates optimal performance.