Table of Contents
Fetching ...

SanDRA: Safe Large-Language-Model-Based Decision Making for Automated Vehicles Using Reachability Analysis

Yuanfei Lin, Sebastian Illing, Matthias Althoff

TL;DR

This work tackles the safety problem of deploying large language models (LLMs) for automated vehicle decision making by marrying LLM-driven action generation with formal reachability analysis. SanDRA prompts LLMs to produce and rank candidate longitudinal-lateral actions, translates those actions into $LTL_{\text{f}}$ formulas, and verifies them against both most likely and set-based predictions using reachability and model checking; formalized traffic rules are conjoined to ensure legal safety, with a fail-safe plan guaranteeing infinite-horizon safety when needed. The approach yields provable safety guarantees and improved rule compliance, while enabling downstream trajectory planners to operate within verified safe corridors; it demonstrates strong performance in open- and closed-loop evaluations and shows practicality via public code release. The framework reduces reliance on large labeled datasets for decision making, offers a transparent safety layer around LLM reasoning, and paves the way for safer integration of learning-based components in autonomous driving. Overall, SanDRA provides a principled, extensible pathway to deploy LLM-based decision making in automated vehicles without sacrificing formal safety guarantees or legal compliance.

Abstract

Large language models have been widely applied to knowledge-driven decision-making for automated vehicles due to their strong generalization and reasoning capabilities. However, the safety of the resulting decisions cannot be ensured due to possible hallucinations and the lack of integrated vehicle dynamics. To address this issue, we propose SanDRA, the first safe large-language-model-based decision making framework for automated vehicles using reachability analysis. Our approach starts with a comprehensive description of the driving scenario to prompt large language models to generate and rank feasible driving actions. These actions are translated into temporal logic formulas that incorporate formalized traffic rules, and are subsequently integrated into reachability analysis to eliminate unsafe actions. We validate our approach in both open-loop and closed-loop driving environments using off-the-shelf and finetuned large language models, showing that it can provide provably safe and, where possible, legally compliant driving actions, even under high-density traffic conditions. To ensure transparency and facilitate future research, all code and experimental setups are publicly available at github.com/CommonRoad/SanDRA.

SanDRA: Safe Large-Language-Model-Based Decision Making for Automated Vehicles Using Reachability Analysis

TL;DR

This work tackles the safety problem of deploying large language models (LLMs) for automated vehicle decision making by marrying LLM-driven action generation with formal reachability analysis. SanDRA prompts LLMs to produce and rank candidate longitudinal-lateral actions, translates those actions into formulas, and verifies them against both most likely and set-based predictions using reachability and model checking; formalized traffic rules are conjoined to ensure legal safety, with a fail-safe plan guaranteeing infinite-horizon safety when needed. The approach yields provable safety guarantees and improved rule compliance, while enabling downstream trajectory planners to operate within verified safe corridors; it demonstrates strong performance in open- and closed-loop evaluations and shows practicality via public code release. The framework reduces reliance on large labeled datasets for decision making, offers a transparent safety layer around LLM reasoning, and paves the way for safer integration of learning-based components in autonomous driving. Overall, SanDRA provides a principled, extensible pathway to deploy LLM-based decision making in automated vehicles without sacrificing formal safety guarantees or legal compliance.

Abstract

Large language models have been widely applied to knowledge-driven decision-making for automated vehicles due to their strong generalization and reasoning capabilities. However, the safety of the resulting decisions cannot be ensured due to possible hallucinations and the lack of integrated vehicle dynamics. To address this issue, we propose SanDRA, the first safe large-language-model-based decision making framework for automated vehicles using reachability analysis. Our approach starts with a comprehensive description of the driving scenario to prompt large language models to generate and rank feasible driving actions. These actions are translated into temporal logic formulas that incorporate formalized traffic rules, and are subsequently integrated into reachability analysis to eliminate unsafe actions. We validate our approach in both open-loop and closed-loop driving environments using off-the-shelf and finetuned large language models, showing that it can provide provably safe and, where possible, legally compliant driving actions, even under high-density traffic conditions. To ensure transparency and facilitate future research, all code and experimental setups are publicly available at github.com/CommonRoad/SanDRA.

Paper Structure

This paper contains 37 sections, 8 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: An example usage of SanDRA, where the LLM is prompted to generate a ranked list of longitudinal and lateral action pairs ordered from best to worst. The action pair corresponding to stopping while staying on the current lane is verified as safe using reachability analysis and then executed.
  • Figure 2: Exemplary road network with lanes defined by lanelets. The ego vehicle is currently following lane $L_{\mathrm{cur}}$, as determined by the high-level routing module to reach the goal. Alternatively, it may switch to the left-adjacent lane $L_{\mathrm{adj\_l}}$ instead of remaining on $L_\mathrm{cur}$.
  • Figure 3: SanDRA overview diagram. Our tool SanDRA takes planning and environmental information as inputs and processes them to generate a structured description of the current scenario. The description is then used to prompt the LLM to produce a ranked sequence of longitudinal and lateral action pairs, ordered from best to worst. After converting the actions and traffic rules into LTL$_\text{f}$ formulas, we take their conjunction and apply reachability analysis to verify the safety of the resulting behavior. Once verified, the action pair is either executed directly or its corresponding reachable sets are passed to the motion planner to generate safe trajectories. If no verified actions are available, a fail-safe plan is executed.
  • Figure 4: LLM prompt design. Automatically generated content in the prompt is marked with $\{$curly brackets$\}$, while manual inputs are in $[$square brackets$]$, and $($round brackets$)$ denote optional elements depending on the road structure and its regulatory features.
  • Figure 5: Prediction of other traffic participants and reachable set of the ego vehicle. The specifications $\varphi_{\text{acc\_foll}}$ and $\varphi_{\text{R\_G1}}$ are given in (\ref{['eq:acc_follow']}) and (\ref{['eq:rg1']}), respectively.
  • ...and 4 more figures