Table of Contents
Fetching ...

Dual-Stage LLM Framework for Scenario-Centric Semantic Interpretation in Driving Assistance

Jean Douglas Carvalho, Hugo Taciro Kenji, Ahmad Mohammad Saber, Glaucia Melo, Max Mauro Dias Santos, Deepa Kundur

Abstract

Advanced Driver Assistance Systems (ADAS) increasingly rely on learning-based perception, yet safety-relevant failures often arise without component malfunction, driven instead by partial observability and semantic ambiguity in how risk is interpreted and communicated. This paper presents a scenario-centric framework for reproducible auditing of LLM-based risk reasoning in urban driving contexts. Deterministic, temporally bounded scenario windows are constructed from multimodal driving data and evaluated under fixed prompt constraints and a closed numeric risk schema, ensuring structured and comparable outputs across models. Experiments on a curated near-people scenario set compare two text-only models and one multimodal model under identical inputs and prompts. Results reveal systematic inter-model divergence in severity assignment, high-risk escalation, evidence use, and causal attribution. Disagreement extends to the interpretation of vulnerable road user presence, indicating that variability often reflects intrinsic semantic indeterminacy rather than isolated model failure. These findings highlight the importance of scenario-centric auditing and explicit ambiguity management when integrating LLM-based reasoning into safety-aligned driver assistance systems.

Dual-Stage LLM Framework for Scenario-Centric Semantic Interpretation in Driving Assistance

Abstract

Advanced Driver Assistance Systems (ADAS) increasingly rely on learning-based perception, yet safety-relevant failures often arise without component malfunction, driven instead by partial observability and semantic ambiguity in how risk is interpreted and communicated. This paper presents a scenario-centric framework for reproducible auditing of LLM-based risk reasoning in urban driving contexts. Deterministic, temporally bounded scenario windows are constructed from multimodal driving data and evaluated under fixed prompt constraints and a closed numeric risk schema, ensuring structured and comparable outputs across models. Experiments on a curated near-people scenario set compare two text-only models and one multimodal model under identical inputs and prompts. Results reveal systematic inter-model divergence in severity assignment, high-risk escalation, evidence use, and causal attribution. Disagreement extends to the interpretation of vulnerable road user presence, indicating that variability often reflects intrinsic semantic indeterminacy rather than isolated model failure. These findings highlight the importance of scenario-centric auditing and explicit ambiguity management when integrating LLM-based reasoning into safety-aligned driver assistance systems.

Paper Structure

This paper contains 35 sections, 8 equations, 12 figures.

Figures (12)

  • Figure 1: Unified Multimodal Dataset: Data from heterogeneous vehicles are abstracted into a unified multimodal representation that jointly encodes visual perception, vehicle telemetry, and external contextual information within the multimodal data platform.
  • Figure 2: Deterministic scenario construction and consumption workflow. Normalized scene states $\mathcal{U}(a,t)$, indexed at 1 Hz, are queried by backend data services to materialize scenario snapshots $\mathcal{S}(a,t)$ through the deterministic operator $\mathcal{B}(\cdot)$ under a structured query specification $\theta$. Scenario construction combines semantic filters spanning visual perception, vehicle telemetry, weather context, and map information. The resulting scenarios are exposed consistently to both the web-based authoring interface for selection and inspection and the local testing module for scenario evaluation.
  • Figure 3: Scenario window generation from a single user-defined key moment ($t_0$), illustrating controlled temporal expansion before and after the anchor instant.
  • Figure 4: Scenario-driven workflow of the platform, from web-based identification of key moments to local temporal expansion and structured prompt construction. Curated scenarios are transformed into bounded temporal windows and independently evaluated by multiple language models under identical conditions, thereby establishing a deterministic, reproducible path from interactive scenario selection to controlled model inference.
  • Figure 5: Mosaic of representative sixteen scene anchors, that serve as temporal seeds for subsequent scenario expansion, illustrating the diversity of near-people situations across urban layouts, traffic densities, lighting conditions, and pedestrian configurations.
  • ...and 7 more figures