Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model
Yongqi Zhao, Wenbo Xiao, Tomislav Mihalj, Jia Hu, Arno Eichberger
TL;DR
Chat2Scenario presents a GPT-4–driven framework to extract concrete driving scenarios from naturalistic data for ADS validation, addressing data accessibility and preprocessing bottlenecks. The methodology couples a Streamlit web app with an LLM-based scenario understanding module, a rule-based activity/position classifier, and a criticality analysis to filter and rank scenarios, exporting them to ASAM OpenSCENARIO and IPG CarMaker formats. It leverages the highD dataset and demonstrates qualitative extraction of typical scenarios (following, cut-in, cut-out) with reconstruction in Esmini and CarMaker, and provides quantitative metrics on track #36 showing robust performance, especially in cut-in/cut-out cases. The approach enables efficient, scalable scenario search and open-source tooling for ADS virtual testing and validation, with future work aimed at diversifying datasets and refining criticality measures.
Abstract
The advent of Large Language Models (LLM) provides new insights to validate Automated Driving Systems (ADS). In the herein-introduced work, a novel approach to extracting scenarios from naturalistic driving datasets is presented. A framework called Chat2Scenario is proposed leveraging the advanced Natural Language Processing (NLP) capabilities of LLM to understand and identify different driving scenarios. By inputting descriptive texts of driving conditions and specifying the criticality metric thresholds, the framework efficiently searches for desired scenarios and converts them into ASAM OpenSCENARIO and IPG CarMaker text files. This methodology streamlines the scenario extraction process and enhances efficiency. Simulations are executed to validate the efficiency of the approach. The framework is presented based on a user-friendly web app and is accessible via the following link: https://github.com/ftgTUGraz/Chat2Scenario.
