Table of Contents
Fetching ...

PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario

Sriram Mandalika, Lalitha V, Athira Nambiar

TL;DR

PRIMEDrive-CoT addresses uncertainty in driving scene understanding by integrating Bayesian Graph Neural Networks for object interactions with precognitive chain-of-thought reasoning and Grad-CAM explanations. The framework fuses LiDAR and multi-view RGB data for robust 3D detection, quantifies uncertainty via entropy and orientation deviation, and computes proximity-based risk to prioritize hazards. It also provides interpretable CoT narratives and visual explanations, with a human-in-the-loop mechanism to refine predictions. On the DriveCoT benchmark, PRIMEDrive-CoT outperforms state-of-the-art CoT and risk-aware methods, demonstrating improved robustness in occluded or complex scenarios and enabling safer, more transparent autonomous driving decisions.

Abstract

Driving scene understanding is a critical real-world problem that involves interpreting and associating various elements of a driving environment, such as vehicles, pedestrians, and traffic signals. Despite advancements in autonomous driving, traditional pipelines rely on deterministic models that fail to capture the probabilistic nature and inherent uncertainty of real-world driving. To address this, we propose PRIMEDrive-CoT, a novel uncertainty-aware model for object interaction and Chain-of-Thought (CoT) reasoning in driving scenarios. In particular, our approach combines LiDAR-based 3D object detection with multi-view RGB references to ensure interpretable and reliable scene understanding. Uncertainty and risk assessment, along with object interactions, are modelled using Bayesian Graph Neural Networks (BGNNs) for probabilistic reasoning under ambiguous conditions. Interpretable decisions are facilitated through CoT reasoning, leveraging object dynamics and contextual cues, while Grad-CAM visualizations highlight attention regions. Extensive evaluations on the DriveCoT dataset demonstrate that PRIMEDrive-CoT outperforms state-of-the-art CoT and risk-aware models.

PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario

TL;DR

PRIMEDrive-CoT addresses uncertainty in driving scene understanding by integrating Bayesian Graph Neural Networks for object interactions with precognitive chain-of-thought reasoning and Grad-CAM explanations. The framework fuses LiDAR and multi-view RGB data for robust 3D detection, quantifies uncertainty via entropy and orientation deviation, and computes proximity-based risk to prioritize hazards. It also provides interpretable CoT narratives and visual explanations, with a human-in-the-loop mechanism to refine predictions. On the DriveCoT benchmark, PRIMEDrive-CoT outperforms state-of-the-art CoT and risk-aware methods, demonstrating improved robustness in occluded or complex scenarios and enabling safer, more transparent autonomous driving decisions.

Abstract

Driving scene understanding is a critical real-world problem that involves interpreting and associating various elements of a driving environment, such as vehicles, pedestrians, and traffic signals. Despite advancements in autonomous driving, traditional pipelines rely on deterministic models that fail to capture the probabilistic nature and inherent uncertainty of real-world driving. To address this, we propose PRIMEDrive-CoT, a novel uncertainty-aware model for object interaction and Chain-of-Thought (CoT) reasoning in driving scenarios. In particular, our approach combines LiDAR-based 3D object detection with multi-view RGB references to ensure interpretable and reliable scene understanding. Uncertainty and risk assessment, along with object interactions, are modelled using Bayesian Graph Neural Networks (BGNNs) for probabilistic reasoning under ambiguous conditions. Interpretable decisions are facilitated through CoT reasoning, leveraging object dynamics and contextual cues, while Grad-CAM visualizations highlight attention regions. Extensive evaluations on the DriveCoT dataset demonstrate that PRIMEDrive-CoT outperforms state-of-the-art CoT and risk-aware models.

Paper Structure

This paper contains 26 sections, 11 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of our proposed PRIMEDrive-CoT framework. The pipeline consists of Stage 1 (Sec \ref{['sec:3.1']}):) Data Preprocessing, Stage 2 (Sec \ref{['sec:3.2']}):) 3D object detection, Stage 3 (Sec \ref{['sec:3.3']}):) Uncertainty & Risk assessment, Stage 4 (Sec \ref{['sec:3.4']}):) Object Interaction Learning and Stage 5 (Sec \ref{['sec:3.5']}):) Chain-of-Thought (CoT) reasoning and explanation.
  • Figure 2: Qualitative results of LiDAR-based 3D detection and proximity-based uncertainty risk assessment ranking. The predicted bounding boxes (blue) are overlaid on the LiDAR point cloud, while the ground truth (red) serves as a reference.
  • Figure 3: Qualitative results of PRIMEDrive-CoT. The model anticipates and responds to high-risk scenarios, including (a) slowing for static vehicles, (b) following vehicles ahead, (c) braking for pedestrians, and (d) live speed decisions over time. These results demonstrate the role of BGNN-driven interaction reasoning in refining uncertainty and enabling interpretable decisions.
  • Figure 4: Chain-of-thought (CoT) decision flow corresponding to Fig. \ref{['fig:cot_interaction']} for our approach.
  • Figure 5: Visualization of interacting objects using Grad-CAM.