Talk2DM: Enabling Natural Language Querying and Commonsense Reasoning for Vehicle-Road-Cloud Integrated Dynamic Maps with Large Language Models

Lu Tao; Jinxuan Luo; Yousuke Watanabe; Zhengshu Zhou; Yuhuan Lu; Shen Ying; Pan Zhang; Fei Zhao; Hiroaki Takada

Talk2DM: Enabling Natural Language Querying and Commonsense Reasoning for Vehicle-Road-Cloud Integrated Dynamic Maps with Large Language Models

Lu Tao, Jinxuan Luo, Yousuke Watanabe, Zhengshu Zhou, Yuhuan Lu, Shen Ying, Pan Zhang, Fei Zhao, Hiroaki Takada

TL;DR

Talk2DM introduces a natural-language interface for Vehicle-Road-Cloud Dynamic Maps (VRC-DM) by coupling a VRC-CP data simulator (VRCsim) with a VRC-CP–tailored QA dataset (VRC-QA) and a plug-in module that enables over-the-horizon NL querying and commonsense reasoning. The core innovation, Chain-of-Prompt (CoP) prompting, fuses human-defined rules with LLM commonsense to enable accurate NL queries over structured CP data, while keeping a clear separation between data generation, reasoning, and response formatting. Empirical results on VRC-QA show Talk2DM generalizes across multiple large-language-model families, achieving over 90% NL query accuracy with average response times of 2–5 seconds; larger models improve accuracy but incur latency, with Gemma3:27B and GPT-oss offering the best trade-offs. The work provides a practical, model-agnostic pathway to integrate NL querying and commonsense reasoning into VRC-DM systems, potentially improving human-DM interaction, interpretability, and decision support in mixed-traffic autonomous driving environments.

Abstract

Dynamic maps (DM) serve as the fundamental information infrastructure for vehicle-road-cloud (VRC) cooperative autonomous driving in China and Japan. By providing comprehensive traffic scene representations, DM overcome the limitations of standalone autonomous driving systems (ADS), such as physical occlusions. Although DM-enhanced ADS have been successfully deployed in real-world applications in Japan, existing DM systems still lack a natural-language-supported (NLS) human interface, which could substantially enhance human-DM interaction. To address this gap, this paper introduces VRCsim, a VRC cooperative perception (CP) simulation framework designed to generate streaming VRC-CP data. Based on VRCsim, we construct a question-answering data set, VRC-QA, focused on spatial querying and reasoning in mixed-traffic scenes. Building upon VRCsim and VRC-QA, we further propose Talk2DM, a plug-and-play module that extends VRC-DM systems with NLS querying and commonsense reasoning capabilities. Talk2DM is built upon a novel chain-of-prompt (CoP) mechanism that progressively integrates human-defined rules with the commonsense knowledge of large language models (LLMs). Experiments on VRC-QA show that Talk2DM can seamlessly switch across different LLMs while maintaining high NLS query accuracy, demonstrating strong generalization capability. Although larger models tend to achieve higher accuracy, they incur significant efficiency degradation. Our results reveal that Talk2DM, powered by Qwen3:8B, Gemma3:27B, and GPT-oss models, achieves over 93\% NLS query accuracy with an average response time of only 2-5 seconds, indicating strong practical potential.

Talk2DM: Enabling Natural Language Querying and Commonsense Reasoning for Vehicle-Road-Cloud Integrated Dynamic Maps with Large Language Models

TL;DR

Abstract

Paper Structure (30 sections, 21 equations, 11 figures, 12 tables)

This paper contains 30 sections, 21 equations, 11 figures, 12 tables.

Introduction
Related Work
Large Language Model Enhanced Autonomous Driving
Cooperative Perception
Dynamic Maps
System Architecture
VRCsim
VRC-QA
Object Association and Relation Calculation
Attribute-entity-relation Graph Construction
QA Template Design and Refinement
ego-centric QA
ego-agnostic QA
QA Instantiation
Talk2DM
...and 15 more sections

Figures (11)

Figure 1: Talk2DM--A human interface of VRC-DM framework, enabling over-the-horizon perception, natural-language-supported querying, and commonsense reasoning. A demonstration video of Talk2DM can be viewed at https://youtu.be/mg3UsLoHz2Q or https://www.bilibili.com/video/BV1Crc4zAE5T
Figure 2: The system architecture. The blue arrows indicate communication connections; the orange arrows represent the data processing flow.
Figure 3: The architecture of VRCsim.
Figure 4: Attribute-entity-relation graph, in which $e_j$ denotes an AV entity (ego vehicle), $o_i$ is an associated object, $a_k$ is an attached attribute, and $R_i$ represents a kind of relations. The attributes and relations are used to generate QA pair.
Figure 5: Primitive question templates designed for the ego-centric QA generation. These questions are one-hop.
...and 6 more figures

Talk2DM: Enabling Natural Language Querying and Commonsense Reasoning for Vehicle-Road-Cloud Integrated Dynamic Maps with Large Language Models

TL;DR

Abstract

Talk2DM: Enabling Natural Language Querying and Commonsense Reasoning for Vehicle-Road-Cloud Integrated Dynamic Maps with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)