Table of Contents
Fetching ...

Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving

Yuhang Lu, Yichen Yao, Jiadong Tu, Jiangnan Shao, Yuexin Ma, Xinge Zhu

TL;DR

This work addresses the gap between general-purpose LVLMs and the domain-specific knowledge required for safe autonomous driving by introducing Intelligent Driving Knowledge Base (IDKB). IDKB combines driving handbooks, theory tests, and CARLA-simulated road data across 15 countries and 9 languages, totaling over 1M data items, to train and evaluate LVLMs on driving knowledge tasks. The authors evaluate 15 LVLMs, reveal gaps in driving knowledge among base models, and show that fine-tuning with IDKB yields substantial improvements, approaching proprietary-model performance on driving tasks. They further demonstrate practical benefits by applying IDKB-enhanced models to nuScenes trajectory planning, achieving safer and more rational planning, underscoring IDKB's potential to enable more reliable AGI for autonomous driving.

Abstract

Large Vision-Language Models (LVLMs) have recently garnered significant attention, with many efforts aimed at harnessing their general knowledge to enhance the interpretability and robustness of autonomous driving models. However, LVLMs typically rely on large, general-purpose datasets and lack the specialized expertise required for professional and safe driving. Existing vision-language driving datasets focus primarily on scene understanding and decision-making, without providing explicit guidance on traffic rules and driving skills, which are critical aspects directly related to driving safety. To bridge this gap, we propose IDKB, a large-scale dataset containing over one million data items collected from various countries, including driving handbooks, theory test data, and simulated road test data. Much like the process of obtaining a driver's license, IDKB encompasses nearly all the explicit knowledge needed for driving from theory to practice. In particular, we conducted comprehensive tests on 15 LVLMs using IDKB to assess their reliability in the context of autonomous driving and provided extensive analysis. We also fine-tuned popular models, achieving notable performance improvements, which further validate the significance of our dataset. The project page can be found at: \url{https://4dvlab.github.io/project_page/idkb.html}

Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving

TL;DR

This work addresses the gap between general-purpose LVLMs and the domain-specific knowledge required for safe autonomous driving by introducing Intelligent Driving Knowledge Base (IDKB). IDKB combines driving handbooks, theory tests, and CARLA-simulated road data across 15 countries and 9 languages, totaling over 1M data items, to train and evaluate LVLMs on driving knowledge tasks. The authors evaluate 15 LVLMs, reveal gaps in driving knowledge among base models, and show that fine-tuning with IDKB yields substantial improvements, approaching proprietary-model performance on driving tasks. They further demonstrate practical benefits by applying IDKB-enhanced models to nuScenes trajectory planning, achieving safer and more rational planning, underscoring IDKB's potential to enable more reliable AGI for autonomous driving.

Abstract

Large Vision-Language Models (LVLMs) have recently garnered significant attention, with many efforts aimed at harnessing their general knowledge to enhance the interpretability and robustness of autonomous driving models. However, LVLMs typically rely on large, general-purpose datasets and lack the specialized expertise required for professional and safe driving. Existing vision-language driving datasets focus primarily on scene understanding and decision-making, without providing explicit guidance on traffic rules and driving skills, which are critical aspects directly related to driving safety. To bridge this gap, we propose IDKB, a large-scale dataset containing over one million data items collected from various countries, including driving handbooks, theory test data, and simulated road test data. Much like the process of obtaining a driver's license, IDKB encompasses nearly all the explicit knowledge needed for driving from theory to practice. In particular, we conducted comprehensive tests on 15 LVLMs using IDKB to assess their reliability in the context of autonomous driving and provided extensive analysis. We also fine-tuned popular models, achieving notable performance improvements, which further validate the significance of our dataset. The project page can be found at: \url{https://4dvlab.github.io/project_page/idkb.html}
Paper Structure (41 sections, 5 equations, 16 figures, 8 tables, 1 algorithm)

This paper contains 41 sections, 5 equations, 16 figures, 8 tables, 1 algorithm.

Figures (16)

  • Figure 1: Performance of 15 representative Large Vision-Language Models on IDKB, evaluated by three driving knowledge understanding metrics.
  • Figure 2: Data construction pipeline of IDKB dataset. For Driving Handbook and Driving Test Data, we collect comprehensive driving knowledge resources from internet, followed by data extraction and postprocessing to obtain the final data. For Driving Road Data, we utilize CARLA to generate simulated road scenarios focused on traffic sign comprehension.
  • Figure 3: Annotated examples of three data sources -- Driving Handbook Data, Driving Test Data, and Driving Road Data.
  • Figure 4: Data distribution in terms of data source, data domain and knowledge category.
  • Figure 5: Visualization of LVLM’s inference process. Qwen-VL-chat, fine-tuned on both IDKB and nuScenes, identifies the traffic sign ahead and recommends a driving decision to slow down.
  • ...and 11 more figures