Table of Contents
Fetching ...

BjTT: A Large-scale Multimodal Dataset for Traffic Prediction

Chengyang Zhang, Yong Zhang, Qitan Shao, Jiangtao Feng, Bo Li, Yisheng Lv, Xinglin Piao, Baocai Yin

TL;DR

BjTT introduces the Beijing Text-Traffic (BjTT) dataset, a large-scale multimodal resource pairing traffic time-series with textual event descriptions to advance traffic prediction. The authors also propose ChatTraffic, a diffusion-based text-to-traffic generator augmented with a Graph Convolutional Network to align textual descriptions with road-network structure, demonstrating that text guidance improves long-horizon accuracy and realism of generated traffic states. BjTT comprises 1,260 road vertices, 32,400 time steps over three months, with velocity and congestion data and 40+ event types, enabling evaluation of six baselines and a latent diffusion model (LDM). Together, BjTT and ChatTraffic offer a valuable platform for multimodal traffic prediction and for developing contingency-aware traffic management through text-conditioned generation.

Abstract

Traffic prediction is one of the most significant foundations in Intelligent Transportation Systems (ITS). Traditional traffic prediction methods rely only on historical traffic data to predict traffic trends and face two main challenges. 1) insensitivity to unusual events. 2) limited performance in long-term prediction. In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation, and name the task Text-to-Traffic Generation (TTG). The key challenge of the TTG task is how to associate text with the spatial structure of the road network and traffic data for generating traffic situations. To this end, we propose ChatTraffic, the first diffusion model for text-to-traffic generation. To guarantee the consistency between synthetic and real data, we augment a diffusion model with the Graph Convolutional Network (GCN) to extract spatial correlations of traffic data. In addition, we construct a large dataset containing text-traffic pairs for the TTG task. We benchmarked our model qualitatively and quantitatively on the released dataset. The experimental results indicate that ChatTraffic can generate realistic traffic situations from the text. Our code and dataset are available at https://github.com/ChyaZhang/ChatTraffic.

BjTT: A Large-scale Multimodal Dataset for Traffic Prediction

TL;DR

BjTT introduces the Beijing Text-Traffic (BjTT) dataset, a large-scale multimodal resource pairing traffic time-series with textual event descriptions to advance traffic prediction. The authors also propose ChatTraffic, a diffusion-based text-to-traffic generator augmented with a Graph Convolutional Network to align textual descriptions with road-network structure, demonstrating that text guidance improves long-horizon accuracy and realism of generated traffic states. BjTT comprises 1,260 road vertices, 32,400 time steps over three months, with velocity and congestion data and 40+ event types, enabling evaluation of six baselines and a latent diffusion model (LDM). Together, BjTT and ChatTraffic offer a valuable platform for multimodal traffic prediction and for developing contingency-aware traffic management through text-conditioned generation.

Abstract

Traffic prediction is one of the most significant foundations in Intelligent Transportation Systems (ITS). Traditional traffic prediction methods rely only on historical traffic data to predict traffic trends and face two main challenges. 1) insensitivity to unusual events. 2) limited performance in long-term prediction. In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation, and name the task Text-to-Traffic Generation (TTG). The key challenge of the TTG task is how to associate text with the spatial structure of the road network and traffic data for generating traffic situations. To this end, we propose ChatTraffic, the first diffusion model for text-to-traffic generation. To guarantee the consistency between synthetic and real data, we augment a diffusion model with the Graph Convolutional Network (GCN) to extract spatial correlations of traffic data. In addition, we construct a large dataset containing text-traffic pairs for the TTG task. We benchmarked our model qualitatively and quantitatively on the released dataset. The experimental results indicate that ChatTraffic can generate realistic traffic situations from the text. Our code and dataset are available at https://github.com/ChyaZhang/ChatTraffic.
Paper Structure (21 sections, 2 equations, 8 figures, 2 tables)

This paper contains 21 sections, 2 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: The illustration of one text-traffic pair within the BjTT dataset. Each traffic data contains different types of road information and is coupled with a text describing the traffic system.
  • Figure 2: The construction pipeline of the BjTT dataset includes two main parts: data collection and data processing. The processed traffic (blue boxes) and text (green boxes) data are matched up one by one to get the final dataset.
  • Figure 3: Data grouping process for road sections. Road sections are grouped by the road name, and the velocity and congestion level of the grouped roads are calculated by average.
  • Figure 4: All event types that included in the BjTT dataset. These events cover not only common road traffic events such as construction and traffic accidents but also unusual weather and large social events.
  • Figure 5: Dataset statistics of the BjTT dataset. (a) Comparison of datasets from the aspects of time steps and number of vertices, (b) proportion of top-frequent events out of all events, (c) number of events recorded in different periods of the day, (d) distribution of the number of words in the text.
  • ...and 3 more figures