Table of Contents
Fetching ...

LAVQA: A Latency-Aware Visual Question Answering Framework for Shared Autonomy in Self-Driving Vehicles

Shuangyu Xie, Kaiyuan Chen, Wenjing Chen, Chengyuan Qian, Christian Juette, Liu Ren, Dezhen Song, Ken Goldberg

TL;DR

LAVQA tackles the challenge of safely coordinating autonomous vehicles with remote human operators under variable decision latency. It introduces LICOM, a Latency-Induced Collision Map, and LICP, the Latency-Induced Collision Probability, to visualize and quantify how safety regions evolve as delay increases, and embeds these signals into a Visual Question Answering interface for shared autonomy. Through CARLA-based closed-loop simulations, LAVQA demonstrates substantial collision-rate reductions compared with latency-agnostic baselines, illustrating the practical value of explicitly modeling temporal risk. The framework advances how operators reason about safety in dynamic environments by fusing probabilistic motion prediction, latency-aware risk estimation, and intuitive visual overlays. Potential impact includes more reliable human-in-the-loop control for AVs in time-critical, uncertain scenarios, and the groundwork for VLM-enabled VQA enhancements in autonomous driving.

Abstract

When uncertainty is high, self-driving vehicles may halt for safety and benefit from the access to remote human operators who can provide high-level guidance. This paradigm, known as {shared autonomy}, enables autonomous vehicle and remote human operators to jointly formulate appropriate responses. To address critical decision timing with variable latency due to wireless network delays and human response time, we present LAVQA, a latency-aware shared autonomy framework that integrates Visual Question Answering (VQA) and spatiotemporal risk visualization. LAVQA augments visual queries with Latency-Induced COllision Map (LICOM), a dynamically evolving map that represents both temporal latency and spatial uncertainty. It enables remote operator to observe as the vehicle safety regions vary over time in the presence of dynamic obstacles and delayed responses. Closed-loop simulations in CARLA, the de-facto standard for autonomous vehicle simulator, suggest that that LAVQA can reduce collision rates by over 8x compared to latency-agnostic baselines.

LAVQA: A Latency-Aware Visual Question Answering Framework for Shared Autonomy in Self-Driving Vehicles

TL;DR

LAVQA tackles the challenge of safely coordinating autonomous vehicles with remote human operators under variable decision latency. It introduces LICOM, a Latency-Induced Collision Map, and LICP, the Latency-Induced Collision Probability, to visualize and quantify how safety regions evolve as delay increases, and embeds these signals into a Visual Question Answering interface for shared autonomy. Through CARLA-based closed-loop simulations, LAVQA demonstrates substantial collision-rate reductions compared with latency-agnostic baselines, illustrating the practical value of explicitly modeling temporal risk. The framework advances how operators reason about safety in dynamic environments by fusing probabilistic motion prediction, latency-aware risk estimation, and intuitive visual overlays. Potential impact includes more reliable human-in-the-loop control for AVs in time-critical, uncertain scenarios, and the groundwork for VLM-enabled VQA enhancements in autonomous driving.

Abstract

When uncertainty is high, self-driving vehicles may halt for safety and benefit from the access to remote human operators who can provide high-level guidance. This paradigm, known as {shared autonomy}, enables autonomous vehicle and remote human operators to jointly formulate appropriate responses. To address critical decision timing with variable latency due to wireless network delays and human response time, we present LAVQA, a latency-aware shared autonomy framework that integrates Visual Question Answering (VQA) and spatiotemporal risk visualization. LAVQA augments visual queries with Latency-Induced COllision Map (LICOM), a dynamically evolving map that represents both temporal latency and spatial uncertainty. It enables remote operator to observe as the vehicle safety regions vary over time in the presence of dynamic obstacles and delayed responses. Closed-loop simulations in CARLA, the de-facto standard for autonomous vehicle simulator, suggest that that LAVQA can reduce collision rates by over 8x compared to latency-agnostic baselines.

Paper Structure

This paper contains 21 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: LAVQA is a latency-aware shared autonomy framework that the remote operator and autonomous vehicle collaboratively make safe and context-appropriate decisions. It uses visual question answering and visually augments how collision risk evolves over decision latency to assist in vehicle navigation.
  • Figure 2: Simulation Setup for Three Traffic Scenarios that the AV must navigate in the presence of a dynamic obstacle.
  • Figure 3: Perceived Collision Probability Across Scenarios and Latencies. LICP models how human perceives collision risk over time under varying latencies (0–400 ms) in three traffic scenarios. Without visualizing and compensating latency-induced collision probability, higher latencies cause increasingly delayed perception of risk. Dashed horizontal lines indicate the risk threshold $\lambda$ that human make the decision.
  • Figure 4: Latency-Induced Collision Maps (LICOM) Across Driving Scenarios and Latency Levels. Each row illustrates a different driving scenario in CARLA: Overtake, Intersection – Turn Left, Intersection – Go Straight, and Merge. Columns correspond to increasing decision latency from 0.5s to 2.5s. The final LICOM overlays indicate regions of high collision probability under delayed execution, with red denoting unsafe zones and green denoting safe zones. As latency increases, safe regions shrink or shift, highlighting the importance of accounting for temporal risk in dynamic environments.

Theorems & Definitions (5)

  • Definition 1: Decision Safety Measure
  • Definition 2: $(\lambda,\tau)$-Safety
  • Definition 3: Collision Probability
  • Definition 4: Latency-Induced Collision Probability
  • Definition 5: LICOM