Table of Contents
Fetching ...

LangCoop: Collaborative Driving with Language

Xiangbo Gao, Yuheng Wu, Rujia Wang, Chenxi Liu, Yang Zhou, Zhengzhong Tu

TL;DR

LangCoop tackles the bandwidth and heterogeneity challenges of multi-agent autonomous driving by using natural language as the communication medium. It combines Mixture Model Modular Chain-of-Thought (M3CoT) for structured zero-shot reasoning and LangPack for compact language-based information exchange, enabling LVLM-driven planning and control. In CARLA simulations, LangCoop achieves up to a 96% reduction in communication bandwidth with messages under 2KB and demonstrates competitive driving performance, including DS up to 48.8 and RC up to 90.3%. The approach shows robustness to heterogeneous LVLMs and is open-sourced, offering a scalable, interpretable pathway toward safer, more efficient language-driven collaborative driving. $DS = RC \cdot (1 - IP)$ complements these findings by linking route completion and infractions to driving quality.

Abstract

Multi-agent collaboration holds great promise for enhancing the safety, reliability, and mobility of autonomous driving systems by enabling information sharing among multiple connected agents. However, existing multi-agent communication approaches are hindered by limitations of existing communication media, including high bandwidth demands, agent heterogeneity, and information loss. To address these challenges, we introduce LangCoop, a new paradigm for collaborative autonomous driving that leverages natural language as a compact yet expressive medium for inter-agent communication. LangCoop features two key innovations: Mixture Model Modular Chain-of-thought (M$^3$CoT) for structured zero-shot vision-language reasoning and Natural Language Information Packaging (LangPack) for efficiently packaging information into concise, language-based messages. Through extensive experiments conducted in the CARLA simulations, we demonstrate that LangCoop achieves a remarkable 96\% reduction in communication bandwidth (< 2KB per message) compared to image-based communication, while maintaining competitive driving performance in the closed-loop evaluation. Our project page and code are at https://xiangbogaobarry.github.io/LangCoop/.

LangCoop: Collaborative Driving with Language

TL;DR

LangCoop tackles the bandwidth and heterogeneity challenges of multi-agent autonomous driving by using natural language as the communication medium. It combines Mixture Model Modular Chain-of-Thought (M3CoT) for structured zero-shot reasoning and LangPack for compact language-based information exchange, enabling LVLM-driven planning and control. In CARLA simulations, LangCoop achieves up to a 96% reduction in communication bandwidth with messages under 2KB and demonstrates competitive driving performance, including DS up to 48.8 and RC up to 90.3%. The approach shows robustness to heterogeneous LVLMs and is open-sourced, offering a scalable, interpretable pathway toward safer, more efficient language-driven collaborative driving. complements these findings by linking route completion and infractions to driving quality.

Abstract

Multi-agent collaboration holds great promise for enhancing the safety, reliability, and mobility of autonomous driving systems by enabling information sharing among multiple connected agents. However, existing multi-agent communication approaches are hindered by limitations of existing communication media, including high bandwidth demands, agent heterogeneity, and information loss. To address these challenges, we introduce LangCoop, a new paradigm for collaborative autonomous driving that leverages natural language as a compact yet expressive medium for inter-agent communication. LangCoop features two key innovations: Mixture Model Modular Chain-of-thought (MCoT) for structured zero-shot vision-language reasoning and Natural Language Information Packaging (LangPack) for efficiently packaging information into concise, language-based messages. Through extensive experiments conducted in the CARLA simulations, we demonstrate that LangCoop achieves a remarkable 96\% reduction in communication bandwidth (< 2KB per message) compared to image-based communication, while maintaining competitive driving performance in the closed-loop evaluation. Our project page and code are at https://xiangbogaobarry.github.io/LangCoop/.

Paper Structure

This paper contains 20 sections, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Overview of the LangCoop framework.
  • Figure 2: Visualization of a natural-language-based collaborative driving scenario. CAV 2 slows down upon receiving the ‘slow down’ intent description from CAV 1. The context is slightly paraphrased for better visualization.