Table of Contents
Fetching ...

Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic Environment

Jayabrata Chowdhury, Venkataramanan Shivaraman, Sumit Dangi, Suresh Sundaram, P. B. Sujit

TL;DR

The paper tackles autonomous-vehicle decision making in dynamic urban traffic by introducing DAD-RL, a lightweight framework that uses an ego-AV-centric Spatio-Temporal Attention Encoder (STAE) and a BEV Context Encoder (CE) to produce a compact state $s_t$ for reinforcement learning. It employs Soft Actor-Critic to train a policy over mid-level actions $\mathcal{A}_t=[V_t^{target}, \Lambda_t]$, combining continuous speed with discrete lane commands, and leverages dense rewards to promote safety and progress. Evaluations on SMARTS show that DAD-RL outperforms state-of-the-art baselines, including transformer-based Scene-Rep-Transformer, with notable gains in success rate and reduced collisions; ablations demonstrate the complementary benefits of STAE and CE. The results suggest that a focused attention-based state encoding can deliver competitive driving performance with lower computational complexity, enabling more scalable and real-time autonomous decision-making in complex traffic scenarios.

Abstract

Autonomous Vehicle (AV) decision making in urban environments is inherently challenging due to the dynamic interactions with surrounding vehicles. For safe planning, AV must understand the weightage of various spatiotemporal interactions in a scene. Contemporary works use colossal transformer architectures to encode interactions mainly for trajectory prediction, resulting in increased computational complexity. To address this issue without compromising spatiotemporal understanding and performance, we propose the simple Deep Attention Driven Reinforcement Learning (DADRL) framework, which dynamically assigns and incorporates the significance of surrounding vehicles into the ego's RL driven decision making process. We introduce an AV centric spatiotemporal attention encoding (STAE) mechanism for learning the dynamic interactions with different surrounding vehicles. To understand map and route context, we employ a context encoder to extract features from context maps. The spatiotemporal representations combined with contextual encoding provide a comprehensive state representation. The resulting model is trained using the Soft Actor Critic (SAC) algorithm. We evaluate the proposed framework on the SMARTS urban benchmarking scenarios without traffic signals to demonstrate that DADRL outperforms recent state of the art methods. Furthermore, an ablation study underscores the importance of the context-encoder and spatio temporal attention encoder in achieving superior performance.

Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic Environment

TL;DR

The paper tackles autonomous-vehicle decision making in dynamic urban traffic by introducing DAD-RL, a lightweight framework that uses an ego-AV-centric Spatio-Temporal Attention Encoder (STAE) and a BEV Context Encoder (CE) to produce a compact state for reinforcement learning. It employs Soft Actor-Critic to train a policy over mid-level actions , combining continuous speed with discrete lane commands, and leverages dense rewards to promote safety and progress. Evaluations on SMARTS show that DAD-RL outperforms state-of-the-art baselines, including transformer-based Scene-Rep-Transformer, with notable gains in success rate and reduced collisions; ablations demonstrate the complementary benefits of STAE and CE. The results suggest that a focused attention-based state encoding can deliver competitive driving performance with lower computational complexity, enabling more scalable and real-time autonomous decision-making in complex traffic scenarios.

Abstract

Autonomous Vehicle (AV) decision making in urban environments is inherently challenging due to the dynamic interactions with surrounding vehicles. For safe planning, AV must understand the weightage of various spatiotemporal interactions in a scene. Contemporary works use colossal transformer architectures to encode interactions mainly for trajectory prediction, resulting in increased computational complexity. To address this issue without compromising spatiotemporal understanding and performance, we propose the simple Deep Attention Driven Reinforcement Learning (DADRL) framework, which dynamically assigns and incorporates the significance of surrounding vehicles into the ego's RL driven decision making process. We introduce an AV centric spatiotemporal attention encoding (STAE) mechanism for learning the dynamic interactions with different surrounding vehicles. To understand map and route context, we employ a context encoder to extract features from context maps. The spatiotemporal representations combined with contextual encoding provide a comprehensive state representation. The resulting model is trained using the Soft Actor Critic (SAC) algorithm. We evaluate the proposed framework on the SMARTS urban benchmarking scenarios without traffic signals to demonstrate that DADRL outperforms recent state of the art methods. Furthermore, an ablation study underscores the importance of the context-encoder and spatio temporal attention encoder in achieving superior performance.
Paper Structure (13 sections, 5 equations, 3 figures, 1 table)

This paper contains 13 sections, 5 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: A left-turn scenario with surrounding vehicles. The AV is depicted in red, while the other vehicles are illustrated in white. The desired route is designated in green. The AV must comprehend the significance of each neighboring vehicle in relation to its final objective of reaching the destination, as indicated by blue arrows of varying weights.
  • Figure 2: The schematic diagram showing the components of the DAD-RL framework. It graphically shows the observation space from the SMARTS simulator, the Spatio-Temporal Attention Encoder, the Bird-Eye-View context encoder, and the action space $V^{target}_t$ and $\Lambda_t$.
  • Figure 3: Plots for (a) Humanness Error and (b) Overall Score for different scenarios. Context-free is DAD-RL without a context encoder module.