A Superalignment Framework in Autonomous Driving with Large Language Models
Xiangrui Kong, Thomas Braunl, Marco Fahmi, Yue Wang
TL;DR
The paper addresses data privacy, security, and alignment challenges in cloud-based LLM/MLLM systems for autonomous driving and proposes a multi-agent LLM security framework built on Behavior Expectation Bounds ($B_{\,\mathbb{P}}(s)$) with data-sensitivity mapping ($D_{\,\mathbb{P}}(s_n)$) and constrained control via ($C_{\,\mathbb{P}}(s_n)$). It formalizes three safety targets—driving safety ($C_{\,\mathbb{P}_{\phi}}(s_i)\subseteq \tilde{C}$), data safety ($D_{\,\mathbb{P}_{\psi}}(s_i)\to 0$), and alignment ($B_{\,\mathbb{P}_{\omega}}(s_i)\to 1$)—to prevent data leakage, ensure regulatory-conforming outputs, and maintain human-aligned behavior. The authors implement and evaluate prompts from eleven LLM-AD studies using AutoGen, and conduct NuScenes-QA–based perception tests with GPT-3.5-turbo and Llama2-70b-chat, analyzing safety, token costs, and alignment across backbones. Results indicate that a carefully designed multi-agent guardrail can reduce data exposure while preserving driving performance, offering a practical pathway toward safer, privacy-preserving LLM-driven autonomous vehicles.
Abstract
Over the last year, significant advancements have been made in the realms of large language models (LLMs) and multi-modal large language models (MLLMs), particularly in their application to autonomous driving. These models have showcased remarkable abilities in processing and interacting with complex information. In autonomous driving, LLMs and MLLMs are extensively used, requiring access to sensitive vehicle data such as precise locations, images, and road conditions. These data are transmitted to an LLM-based inference cloud for advanced analysis. However, concerns arise regarding data security, as the protection against data and privacy breaches primarily depends on the LLM's inherent security measures, without additional scrutiny or evaluation of the LLM's inference outputs. Despite its importance, the security aspect of LLMs in autonomous driving remains underexplored. Addressing this gap, our research introduces a novel security framework for autonomous vehicles, utilizing a multi-agent LLM approach. This framework is designed to safeguard sensitive information associated with autonomous vehicles from potential leaks, while also ensuring that LLM outputs adhere to driving regulations and align with human values. It includes mechanisms to filter out irrelevant queries and verify the safety and reliability of LLM outputs. Utilizing this framework, we evaluated the security, privacy, and cost aspects of eleven large language model-driven autonomous driving cues. Additionally, we performed QA tests on these driving prompts, which successfully demonstrated the framework's efficacy.
