TransGPT: Multi-modal Generative Pre-trained Transformer for Transportation
Peng Wang, Xiang Wei, Fangxu Hu, Wenjuan Han
TL;DR
TransGPT introduces two domain-adapted large language model variants for transportation: a text-focused TransGPT-SM and a multi-modal TransGPT-MM. It builds domain-specific data (STD) and aligned image-text data (MTD, CCAC), and employs instruction tuning with LoRA to specialize a bilingual ChatGLM2-6B backbone for text and VisualGLM-6B for vision-language tasks. The paper demonstrates superior performance over strong baselines on transportation benchmarks, with notable gains in multi-modal reasoning and substantial improvements in synthetic scenario generation and traffic analysis capabilities. Its findings suggest significant practical potential for ITS applications, including traffic forecasting, explanation of phenomena, and comprehensive reporting, while highlighting areas for future work in data diversity and rationales.
Abstract
Natural language processing (NLP) is a key component of intelligent transportation systems (ITS), but it faces many challenges in the transportation domain, such as domain-specific knowledge and data, and multi-modal inputs and outputs. This paper presents TransGPT, a novel (multi-modal) large language model for the transportation domain, which consists of two independent variants: TransGPT-SM for single-modal data and TransGPT-MM for multi-modal data. TransGPT-SM is finetuned on a single-modal Transportation dataset (STD) that contains textual data from various sources in the transportation domain. TransGPT-MM is finetuned on a multi-modal Transportation dataset (MTD) that we manually collected from three areas of the transportation domain: driving tests, traffic signs, and landmarks. We evaluate TransGPT on several benchmark datasets for different tasks in the transportation domain, and show that it outperforms baseline models on most tasks. We also showcase the potential applications of TransGPT for traffic analysis and modeling, such as generating synthetic traffic scenarios, explaining traffic phenomena, answering traffic-related questions, providing traffic recommendations, and generating traffic reports. This work advances the state-of-the-art of NLP in the transportation domain and provides a useful tool for ITS researchers and practitioners.
