Table of Contents
Fetching ...

Novel Actor-Critic Algorithm for Robust Decision Making of CAV under Delays and Loss of V2X Data

Zine el abidine Kherroubi

TL;DR

A novel 'Blind Actor-Critic' algorithm that guarantees robust driving performance in V2X environment with delayed and/or lost data and demonstrates that training metrics are improved compared to conventional actor-critic algorithms and testing results show that the approach provides robust control, even under low V2X network reliability levels.

Abstract

Current autonomous driving systems heavily rely on V2X communication data to enhance situational awareness and the cooperation between vehicles. However, a major challenge when using V2X data is that it may not be available periodically because of unpredictable delays and data loss during wireless transmission between road stations and the receiver vehicle. This issue should be considered when designing control strategies for connected and autonomous vehicles. Therefore, this paper proposes a novel 'Blind Actor-Critic' algorithm that guarantees robust driving performance in V2X environment with delayed and/or lost data. The novel algorithm incorporates three key mechanisms: a virtual fixed sampling period, a combination of Temporal-Difference and Monte Carlo learning, and a numerical approximation of immediate reward values. To address the temporal aperiodicity problem of V2X data, we first illustrate this challenge. Then, we provide a detailed explanation of the Blind Actor-Critic algorithm where we highlight the proposed components to compensate for the temporal aperiodicity problem of V2X data. We evaluate the performance of our algorithm in a simulation environment and compare it to benchmark approaches. The results demonstrate that training metrics are improved compared to conventional actor-critic algorithms. Additionally, testing results show that our approach provides robust control, even under low V2X network reliability levels.

Novel Actor-Critic Algorithm for Robust Decision Making of CAV under Delays and Loss of V2X Data

TL;DR

A novel 'Blind Actor-Critic' algorithm that guarantees robust driving performance in V2X environment with delayed and/or lost data and demonstrates that training metrics are improved compared to conventional actor-critic algorithms and testing results show that the approach provides robust control, even under low V2X network reliability levels.

Abstract

Current autonomous driving systems heavily rely on V2X communication data to enhance situational awareness and the cooperation between vehicles. However, a major challenge when using V2X data is that it may not be available periodically because of unpredictable delays and data loss during wireless transmission between road stations and the receiver vehicle. This issue should be considered when designing control strategies for connected and autonomous vehicles. Therefore, this paper proposes a novel 'Blind Actor-Critic' algorithm that guarantees robust driving performance in V2X environment with delayed and/or lost data. The novel algorithm incorporates three key mechanisms: a virtual fixed sampling period, a combination of Temporal-Difference and Monte Carlo learning, and a numerical approximation of immediate reward values. To address the temporal aperiodicity problem of V2X data, we first illustrate this challenge. Then, we provide a detailed explanation of the Blind Actor-Critic algorithm where we highlight the proposed components to compensate for the temporal aperiodicity problem of V2X data. We evaluate the performance of our algorithm in a simulation environment and compare it to benchmark approaches. The results demonstrate that training metrics are improved compared to conventional actor-critic algorithms. Additionally, testing results show that our approach provides robust control, even under low V2X network reliability levels.
Paper Structure (16 sections, 2 equations, 10 figures, 4 tables)

This paper contains 16 sections, 2 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Illustration of the temporal aperiodicity problem when using V2X data: environment state $(S_i~:~i \in N)$ is perceived by the transmitter ITS-S with a fixed sampling period (T). In the ideal case, the receiver ITS-S receives the environment state with the same fixed sampling period (T), as shown by green color. In practice, the receiver ITS-S receives the environment state asynchronously at varied instants $(S_{t_i}~:~t_i \in R^+)$, as shown in red.
  • Figure 2: Illustration of two use cases where temporal aperiodicity factors are mentioned by dashed red boxes (a) use case with V2V communication and (b) use case with V2I communication.
  • Figure 3: The Actor-critic architecture Sutton.
  • Figure 4: Illustration of introduced variables: fictive sampling period $\tau$, random delay $\delta t_i$, and approximate reward $\hat{r}_{t_i + k.\tau}$.
  • Figure 5: Illustration of simulation framework. The traffic simulator provide information ($s_i$,$r_i$), periodically, to the V2X interface through TraCI. The V2X interface generates delays and loss and then forward information ($s_{t_i}$,$r_{t_i}$) to the algorithm accordingly. The algorithm is executed and action $a_{t_i}$ is provided to the traffic simulator through TraCI.
  • ...and 5 more figures