Table of Contents
Fetching ...

TransGPT: Multi-modal Generative Pre-trained Transformer for Transportation

Peng Wang, Xiang Wei, Fangxu Hu, Wenjuan Han

TL;DR

TransGPT introduces two domain-adapted large language model variants for transportation: a text-focused TransGPT-SM and a multi-modal TransGPT-MM. It builds domain-specific data (STD) and aligned image-text data (MTD, CCAC), and employs instruction tuning with LoRA to specialize a bilingual ChatGLM2-6B backbone for text and VisualGLM-6B for vision-language tasks. The paper demonstrates superior performance over strong baselines on transportation benchmarks, with notable gains in multi-modal reasoning and substantial improvements in synthetic scenario generation and traffic analysis capabilities. Its findings suggest significant practical potential for ITS applications, including traffic forecasting, explanation of phenomena, and comprehensive reporting, while highlighting areas for future work in data diversity and rationales.

Abstract

Natural language processing (NLP) is a key component of intelligent transportation systems (ITS), but it faces many challenges in the transportation domain, such as domain-specific knowledge and data, and multi-modal inputs and outputs. This paper presents TransGPT, a novel (multi-modal) large language model for the transportation domain, which consists of two independent variants: TransGPT-SM for single-modal data and TransGPT-MM for multi-modal data. TransGPT-SM is finetuned on a single-modal Transportation dataset (STD) that contains textual data from various sources in the transportation domain. TransGPT-MM is finetuned on a multi-modal Transportation dataset (MTD) that we manually collected from three areas of the transportation domain: driving tests, traffic signs, and landmarks. We evaluate TransGPT on several benchmark datasets for different tasks in the transportation domain, and show that it outperforms baseline models on most tasks. We also showcase the potential applications of TransGPT for traffic analysis and modeling, such as generating synthetic traffic scenarios, explaining traffic phenomena, answering traffic-related questions, providing traffic recommendations, and generating traffic reports. This work advances the state-of-the-art of NLP in the transportation domain and provides a useful tool for ITS researchers and practitioners.

TransGPT: Multi-modal Generative Pre-trained Transformer for Transportation

TL;DR

TransGPT introduces two domain-adapted large language model variants for transportation: a text-focused TransGPT-SM and a multi-modal TransGPT-MM. It builds domain-specific data (STD) and aligned image-text data (MTD, CCAC), and employs instruction tuning with LoRA to specialize a bilingual ChatGLM2-6B backbone for text and VisualGLM-6B for vision-language tasks. The paper demonstrates superior performance over strong baselines on transportation benchmarks, with notable gains in multi-modal reasoning and substantial improvements in synthetic scenario generation and traffic analysis capabilities. Its findings suggest significant practical potential for ITS applications, including traffic forecasting, explanation of phenomena, and comprehensive reporting, while highlighting areas for future work in data diversity and rationales.

Abstract

Natural language processing (NLP) is a key component of intelligent transportation systems (ITS), but it faces many challenges in the transportation domain, such as domain-specific knowledge and data, and multi-modal inputs and outputs. This paper presents TransGPT, a novel (multi-modal) large language model for the transportation domain, which consists of two independent variants: TransGPT-SM for single-modal data and TransGPT-MM for multi-modal data. TransGPT-SM is finetuned on a single-modal Transportation dataset (STD) that contains textual data from various sources in the transportation domain. TransGPT-MM is finetuned on a multi-modal Transportation dataset (MTD) that we manually collected from three areas of the transportation domain: driving tests, traffic signs, and landmarks. We evaluate TransGPT on several benchmark datasets for different tasks in the transportation domain, and show that it outperforms baseline models on most tasks. We also showcase the potential applications of TransGPT for traffic analysis and modeling, such as generating synthetic traffic scenarios, explaining traffic phenomena, answering traffic-related questions, providing traffic recommendations, and generating traffic reports. This work advances the state-of-the-art of NLP in the transportation domain and provides a useful tool for ITS researchers and practitioners.
Paper Structure (24 sections, 5 figures, 5 tables)

This paper contains 24 sections, 5 figures, 5 tables.

Figures (5)

  • Figure S1: Illustration of a flowchart showing how to automatically generate instruction data from unlabeled text data. The method consists of five steps. The flowchart uses arrows to indicate the sequence between steps, beginning in the upper left corner and ending in the lower left corner.
  • Figure S2: Illustration of MTD and CCAC samples. A.I denotes traffic signs, A.II denotes driving tests, A.III denotes landmarks, B.I denotes general image caption.
  • Figure S3: Composition of STD and MTD data.
  • Figure S4: An example of transforming question-answer pairs into a single-choice format.
  • Figure S5: Case study of TransGPT-MM.