Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications
Cheng Chang, Jingwei Ge, Jiazhe Guo, Zelin Guo, Binghong Jiang, Li Li
TL;DR
This work tackles scalable, high-fidelity embedding and retrieval of driving scenarios to empower Retrieval-Augmented Generation (RAG) for autonomous driving. It introduces Driving-RAG, a three-part framework consisting of an aligned scenario embedding model (RGCN+Transformer) trained to match Graph-DTW distances, a KDE-informed HNSW-TSD search that produces a typical scenario data subset and fast nearest-neighbor retrieval, and a retrieval reorganization module that uses graph relations to produce coherent, task-relevant references for LLM augmentation. Experiments on CitySim and INTERACTION show that embeddings align with the driving-distance metric, HNSW-TSD provides substantial speedups over traditional vector search, and LLM-based trajectory planning benefits from both retrieval and relation-based reorganization, improving metrics such as ADE and goal-confusion. The approach is efficient at scale (approximately 30 ms per query at $10^5$ samples) and generalizable to other RAG-enabled tasks beyond driving, enhancing online decision-making and offline simulation workflows.
Abstract
Driving scenario data play an increasingly vital role in the development of intelligent vehicles and autonomous driving. Accurate and efficient scenario data search is critical for both online vehicle decision-making and planning, and offline scenario generation and simulations, as it allows for leveraging the scenario experiences to improve the overall performance. Especially with the application of large language models (LLMs) and Retrieval-Augmented-Generation (RAG) systems in autonomous driving, urgent requirements are put forward. In this paper, we introduce the Driving-RAG framework to address the challenges of efficient scenario data embedding, search, and applications for RAG systems. Our embedding model aligns fundamental scenario information and scenario distance metrics in the vector space. The typical scenario sampling method combined with hierarchical navigable small world can perform efficient scenario vector search to achieve high efficiency without sacrificing accuracy. In addition, the reorganization mechanism by graph knowledge enhances the relevance to the prompt scenarios and augment LLM generation. We demonstrate the effectiveness of the proposed framework on typical trajectory planning task for complex interactive scenarios such as ramps and intersections, showcasing its advantages for RAG applications.
