A Superalignment Framework in Autonomous Driving with Large Language Models

Xiangrui Kong; Thomas Braunl; Marco Fahmi; Yue Wang

A Superalignment Framework in Autonomous Driving with Large Language Models

Xiangrui Kong, Thomas Braunl, Marco Fahmi, Yue Wang

TL;DR

The paper addresses data privacy, security, and alignment challenges in cloud-based LLM/MLLM systems for autonomous driving and proposes a multi-agent LLM security framework built on Behavior Expectation Bounds ($B_{\,\mathbb{P}}(s)$) with data-sensitivity mapping ($D_{\,\mathbb{P}}(s_n)$) and constrained control via ($C_{\,\mathbb{P}}(s_n)$). It formalizes three safety targets—driving safety ($C_{\,\mathbb{P}_{\phi}}(s_i)\subseteq \tilde{C}$), data safety ($D_{\,\mathbb{P}_{\psi}}(s_i)\to 0$), and alignment ($B_{\,\mathbb{P}_{\omega}}(s_i)\to 1$)—to prevent data leakage, ensure regulatory-conforming outputs, and maintain human-aligned behavior. The authors implement and evaluate prompts from eleven LLM-AD studies using AutoGen, and conduct NuScenes-QA–based perception tests with GPT-3.5-turbo and Llama2-70b-chat, analyzing safety, token costs, and alignment across backbones. Results indicate that a carefully designed multi-agent guardrail can reduce data exposure while preserving driving performance, offering a practical pathway toward safer, privacy-preserving LLM-driven autonomous vehicles.

Abstract

Over the last year, significant advancements have been made in the realms of large language models (LLMs) and multi-modal large language models (MLLMs), particularly in their application to autonomous driving. These models have showcased remarkable abilities in processing and interacting with complex information. In autonomous driving, LLMs and MLLMs are extensively used, requiring access to sensitive vehicle data such as precise locations, images, and road conditions. These data are transmitted to an LLM-based inference cloud for advanced analysis. However, concerns arise regarding data security, as the protection against data and privacy breaches primarily depends on the LLM's inherent security measures, without additional scrutiny or evaluation of the LLM's inference outputs. Despite its importance, the security aspect of LLMs in autonomous driving remains underexplored. Addressing this gap, our research introduces a novel security framework for autonomous vehicles, utilizing a multi-agent LLM approach. This framework is designed to safeguard sensitive information associated with autonomous vehicles from potential leaks, while also ensuring that LLM outputs adhere to driving regulations and align with human values. It includes mechanisms to filter out irrelevant queries and verify the safety and reliability of LLM outputs. Utilizing this framework, we evaluated the security, privacy, and cost aspects of eleven large language model-driven autonomous driving cues. Additionally, we performed QA tests on these driving prompts, which successfully demonstrated the framework's efficacy.

A Superalignment Framework in Autonomous Driving with Large Language Models

TL;DR

) with data-sensitivity mapping (

) and constrained control via (

). It formalizes three safety targets—driving safety (

), data safety (

), and alignment (

)—to prevent data leakage, ensure regulatory-conforming outputs, and maintain human-aligned behavior. The authors implement and evaluate prompts from eleven LLM-AD studies using AutoGen, and conduct NuScenes-QA–based perception tests with GPT-3.5-turbo and Llama2-70b-chat, analyzing safety, token costs, and alignment across backbones. Results indicate that a carefully designed multi-agent guardrail can reduce data exposure while preserving driving performance, offering a practical pathway toward safer, privacy-preserving LLM-driven autonomous vehicles.

Abstract

Paper Structure (10 sections, 3 equations, 5 figures, 5 tables)

This paper contains 10 sections, 3 equations, 5 figures, 5 tables.

introduction
Related work
LLMs in Autonomous Driving
Privacy and Alignment in LLMs
Method
experiments
Implement details
Evaluation of Safety Capabilities
Perception Capabilities Evaluation
conclusion

Figures (5)

Figure 1: LLM Safety-as-a-service autonomous driving framework
Figure 2: LLM-AD system prompt analysis
Figure 3: LLM-AD system prompt analysis of sensitive data usage
Figure 4: Overall accuracy in nuScenes-QA dataset
Figure 5: Results of different models on five question types in nuScenes-QA dataset

A Superalignment Framework in Autonomous Driving with Large Language Models

TL;DR

Abstract

A Superalignment Framework in Autonomous Driving with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)