Table of Contents
Fetching ...

SAFe-Copilot: Unified Shared Autonomy Framework

Phat Nguyen, Erfan Aasi, Shiva Sreeram, Guy Rosman, Andrew Silva, Sertac Karaman, Daniela Rus

TL;DR

The paper tackles the brittleness of autonomous driving in rare or ambiguous scenarios by introducing SAFe-Copilot, a semantic arbitration framework that fuses human input and autonomous plans at a high level using Vision Language Models. It formalizes three modules—Abstraction for high-level plan/state conversion, Uncertainty for detecting unreliable autonomy via an uncertainty score $u_t$, and Reasoning for VLM-based decision making and grounding—whose integration enables proactive fusion or supervisory input depending on confidence. Empirical results in CARLA/Bench2Drive show substantial safety and performance gains: mock-human experiments achieve perfect recall with high accuracy, a human survey reports 92% agreement with arbitration outcomes, and Bench2Drive shows reduced collision rates and improved route completion. Overall, the work demonstrates that semantic, language-based arbitration preserves human intent while leveraging autonomous planning to improve safety and effectiveness in complex driving scenarios.

Abstract

Autonomous driving systems remain brittle in rare, ambiguous, and out-of-distribution scenarios, where human driver succeed through contextual reasoning. Shared autonomy has emerged as a promising approach to mitigate such failures by incorporating human input when autonomy is uncertain. However, most existing methods restrict arbitration to low-level trajectories, which represent only geometric paths and therefore fail to preserve the underlying driving intent. We propose a unified shared autonomy framework that integrates human input and autonomous planners at a higher level of abstraction. Our method leverages Vision Language Models (VLMs) to infer driver intent from multi-modal cues -- such as driver actions and environmental context -- and to synthesize coherent strategies that mediate between human and autonomous control. We first study the framework in a mock-human setting, where it achieves perfect recall alongside high accuracy and precision. A human-subject survey further shows strong alignment, with participants agreeing with arbitration outcomes in 92% of cases. Finally, evaluation on the Bench2Drive benchmark demonstrates a substantial reduction in collision rate and improvement in overall performance compared to pure autonomy. Arbitration at the level of semantic, language-based representations emerges as a design principle for shared autonomy, enabling systems to exercise common-sense reasoning and maintain continuity with human intent.

SAFe-Copilot: Unified Shared Autonomy Framework

TL;DR

The paper tackles the brittleness of autonomous driving in rare or ambiguous scenarios by introducing SAFe-Copilot, a semantic arbitration framework that fuses human input and autonomous plans at a high level using Vision Language Models. It formalizes three modules—Abstraction for high-level plan/state conversion, Uncertainty for detecting unreliable autonomy via an uncertainty score , and Reasoning for VLM-based decision making and grounding—whose integration enables proactive fusion or supervisory input depending on confidence. Empirical results in CARLA/Bench2Drive show substantial safety and performance gains: mock-human experiments achieve perfect recall with high accuracy, a human survey reports 92% agreement with arbitration outcomes, and Bench2Drive shows reduced collision rates and improved route completion. Overall, the work demonstrates that semantic, language-based arbitration preserves human intent while leveraging autonomous planning to improve safety and effectiveness in complex driving scenarios.

Abstract

Autonomous driving systems remain brittle in rare, ambiguous, and out-of-distribution scenarios, where human driver succeed through contextual reasoning. Shared autonomy has emerged as a promising approach to mitigate such failures by incorporating human input when autonomy is uncertain. However, most existing methods restrict arbitration to low-level trajectories, which represent only geometric paths and therefore fail to preserve the underlying driving intent. We propose a unified shared autonomy framework that integrates human input and autonomous planners at a higher level of abstraction. Our method leverages Vision Language Models (VLMs) to infer driver intent from multi-modal cues -- such as driver actions and environmental context -- and to synthesize coherent strategies that mediate between human and autonomous control. We first study the framework in a mock-human setting, where it achieves perfect recall alongside high accuracy and precision. A human-subject survey further shows strong alignment, with participants agreeing with arbitration outcomes in 92% of cases. Finally, evaluation on the Bench2Drive benchmark demonstrates a substantial reduction in collision rate and improvement in overall performance compared to pure autonomy. Arbitration at the level of semantic, language-based representations emerges as a design principle for shared autonomy, enabling systems to exercise common-sense reasoning and maintain continuity with human intent.

Paper Structure

This paper contains 17 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Given a driver intervention, SAFe-Copilot evaluates scene context, human intent, uncertainty, and autonomy's plans to arbitrate the most suitable and safest plan.
  • Figure 2: Overview.(a) Our framework supports two teaming modes: Top: proactive teaming that fuses driver input with autonomy whenever the human intervenes; Bottom: supervisory shared control where the system requests human input under high uncertainty. (b) The system takes as input the driver’s state and control actions, along with the ego-vehicle state. (c) Intra-frame variance measures disagreement among candidate trajectories in a single frame, while inter-frame variance measures changes in the mean trajectory across frames. (d)SAFe-Copilot integrates driver state and actions, vehicle state, and uncertainty measures within a symbolic reasoning module that arbitrates between human and autonomous plans to generate a coherent and safe trajectory.
  • Figure 3: Qualitative example. Output for a scenario in which the driver steers left to avoid an open car door obstructing the lane, while oncoming traffic approaches. The results show that the framework: (A) correctly infers human intent, (B) evaluates the consequences of the human plan, (C) contrasts it with the autonomous plan, and (D) demonstrates an understanding of societal driving norms, leveraging them to fuse both plans into a safer trajectory.
  • Figure 4: Scenario examples.Left: yielding to emergency vehicle. Middle: overtaking an OOD construction sign. Right: Glare perception failure
  • Figure 5: Human Survey. Human survey over sample scenarios showing (a) overwhelming majority found the scenarios plausible; (b) a strong majority agreed with the plan suggested by the VLM; (c) Most participants agreed with the arbitration outcome, and a substantial minority judged it an improvement over their own plan; and (d) overall annotators found the VLM’s analysis of the situations---such as predicting human intent---to be highly accurate.
  • ...and 2 more figures