Table of Contents
Fetching ...

A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation

Shanhe You, Xuewen Luo, Xinhe Liang, Jiashu Yu, Chen Zheng, Jiangtao Gong

TL;DR

The paper targets the lack of a comprehensive evaluation method for driving intelligence in autonomous systems and presents an LLM-powered framework to assess driving behavior across safety, intelligence, and comfort in complex traffic. It constructs a naturalistic driving dataset and a Driver/Passenger knowledge-base, then uses Retrieval-Augmented Generation with hierarchical Chain-of-Thought prompts to evaluate driving performance, validated through CARLA simulations and human assessments. Key contributions include 700 Driver Knowledge Units and 760 Passenger Knowledge Units stored as JSON for knowledge-grounded reasoning, and an empirical validation showing reasonable alignment with CARLA metrics and positive human feedback, with open-source data on Github. The work aims to enable more human-like, context-aware evaluation and design of intelligent autonomous driving agents, with potential impact on safety, reliability, and user experience in real-world deployment.

Abstract

Evaluation methods for autonomous driving are crucial for algorithm optimization. However, due to the complexity of driving intelligence, there is currently no comprehensive evaluation method for the level of autonomous driving intelligence. In this paper, we propose an evaluation framework for driving behavior intelligence in complex traffic environments, aiming to fill this gap. We constructed a natural language evaluation dataset of human professional drivers and passengers through naturalistic driving experiments and post-driving behavior evaluation interviews. Based on this dataset, we developed an LLM-powered driving evaluation framework. The effectiveness of this framework was validated through simulated experiments in the CARLA urban traffic simulator and further corroborated by human assessment. Our research provides valuable insights for evaluating and designing more intelligent, human-like autonomous driving agents. The implementation details of the framework and detailed information about the dataset can be found at Github.

A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation

TL;DR

The paper targets the lack of a comprehensive evaluation method for driving intelligence in autonomous systems and presents an LLM-powered framework to assess driving behavior across safety, intelligence, and comfort in complex traffic. It constructs a naturalistic driving dataset and a Driver/Passenger knowledge-base, then uses Retrieval-Augmented Generation with hierarchical Chain-of-Thought prompts to evaluate driving performance, validated through CARLA simulations and human assessments. Key contributions include 700 Driver Knowledge Units and 760 Passenger Knowledge Units stored as JSON for knowledge-grounded reasoning, and an empirical validation showing reasonable alignment with CARLA metrics and positive human feedback, with open-source data on Github. The work aims to enable more human-like, context-aware evaluation and design of intelligent autonomous driving agents, with potential impact on safety, reliability, and user experience in real-world deployment.

Abstract

Evaluation methods for autonomous driving are crucial for algorithm optimization. However, due to the complexity of driving intelligence, there is currently no comprehensive evaluation method for the level of autonomous driving intelligence. In this paper, we propose an evaluation framework for driving behavior intelligence in complex traffic environments, aiming to fill this gap. We constructed a natural language evaluation dataset of human professional drivers and passengers through naturalistic driving experiments and post-driving behavior evaluation interviews. Based on this dataset, we developed an LLM-powered driving evaluation framework. The effectiveness of this framework was validated through simulated experiments in the CARLA urban traffic simulator and further corroborated by human assessment. Our research provides valuable insights for evaluating and designing more intelligent, human-like autonomous driving agents. The implementation details of the framework and detailed information about the dataset can be found at Github.

Paper Structure

This paper contains 27 sections, 3 figures.

Figures (3)

  • Figure 1: A comprehensive framework for evaluating driving intelligence using LLMs. We use real-world driving interview data to construct a driving evaluation knowledge graph, which, together with driving behaviors collected from the simulator, generates driving context. Then, it generates assessments on safety, intelligence, and comfort, leading to a comprehensive evaluation of the driving performance.
  • Figure 2: Framework Structure for Driving Intelligence Evaluation
  • Figure 3: Human Agreement Score on System Evaluation