V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts
Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen, Stephen F. Smith
TL;DR
This work introduces V2V-GoT, a graph-of-thoughts framework that enables Multimodal Large Language Models to coordinate cooperative driving among connected autonomous vehicles. It traces occlusion-aware perception and planning-aware prediction through a nine-node QA graph, powered by a GoT-enabled MLLM built on LLaVA with temporal LiDAR features from multiple CAVs. The authors curate the V2V-GoT-QA dataset (based on V2V4Real) and demonstrate that V2V-GoT outperforms strong baselines in perception, prediction, and planning tasks, with ablations validating the value of occlusion-aware and planning-aware components. Communication costs remain comparable to prior V2V-LLM approaches, while task performance improves, suggesting practical viability for real-world cooperative driving. The work also provides open-source data and code to accelerate future research in GoT-enabled cooperative autonomous driving.
Abstract
Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of-thoughts framework specifically designed for MLLM-based cooperative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks. Our project website: https://eddyhkchiu.github.io/v2vgot.github.io/ .
