Table of Contents
Fetching ...

ChatTraffic: Text-to-Traffic Generation via Diffusion Model

Chengyang Zhang, Yong Zhang, Qitan Shao, Bo Li, Yisheng Lv, Xinglin Piao, Baocai Yin

TL;DR

The paper addresses the challenge of predicting and generating realistic traffic states under abnormal events and long horizons by proposing Text-to-Traffic Generation (TTG). It introduces ChatTraffic, a diffusion-based model augmented with a Graph Convolutional Network to tie textual descriptions to road-network structure, trained on a large text–traffic dataset. Key contributions include the first diffusion-based TTG model, a multimodal dataset with over 20k text–traffic pairs, and comprehensive ablations showing GCN improves generation consistency and anomaly sensitivity. The approach enables scenario-aware traffic generation from text, offering practical benefits for ITS planning and management by simulating futures under various events and times.

Abstract

Traffic prediction is one of the most significant foundations in Intelligent Transportation Systems (ITS). Traditional traffic prediction methods rely only on historical traffic data to predict traffic trends and face two main challenges. 1) insensitivity to unusual events. 2) limited performance in long-term prediction. In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation, and name the task Text-to-Traffic Generation (TTG). The key challenge of the TTG task is how to associate text with the spatial structure of the road network and traffic data for generating traffic situations. To this end, we propose ChatTraffic, the first diffusion model for text-to-traffic generation. To guarantee the consistency between synthetic and real data, we augment a diffusion model with the Graph Convolutional Network (GCN) to extract spatial correlations of traffic data. In addition, we construct a large dataset containing text-traffic pairs for the TTG task. We benchmarked our model qualitatively and quantitatively on the released dataset. The experimental results indicate that ChatTraffic can generate realistic traffic situations from the text. Our code and dataset are available at https://github.com/ChyaZhang/ChatTraffic.

ChatTraffic: Text-to-Traffic Generation via Diffusion Model

TL;DR

The paper addresses the challenge of predicting and generating realistic traffic states under abnormal events and long horizons by proposing Text-to-Traffic Generation (TTG). It introduces ChatTraffic, a diffusion-based model augmented with a Graph Convolutional Network to tie textual descriptions to road-network structure, trained on a large text–traffic dataset. Key contributions include the first diffusion-based TTG model, a multimodal dataset with over 20k text–traffic pairs, and comprehensive ablations showing GCN improves generation consistency and anomaly sensitivity. The approach enables scenario-aware traffic generation from text, offering practical benefits for ITS planning and management by simulating futures under various events and times.

Abstract

Traffic prediction is one of the most significant foundations in Intelligent Transportation Systems (ITS). Traditional traffic prediction methods rely only on historical traffic data to predict traffic trends and face two main challenges. 1) insensitivity to unusual events. 2) limited performance in long-term prediction. In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation, and name the task Text-to-Traffic Generation (TTG). The key challenge of the TTG task is how to associate text with the spatial structure of the road network and traffic data for generating traffic situations. To this end, we propose ChatTraffic, the first diffusion model for text-to-traffic generation. To guarantee the consistency between synthetic and real data, we augment a diffusion model with the Graph Convolutional Network (GCN) to extract spatial correlations of traffic data. In addition, we construct a large dataset containing text-traffic pairs for the TTG task. We benchmarked our model qualitatively and quantitatively on the released dataset. The experimental results indicate that ChatTraffic can generate realistic traffic situations from the text. Our code and dataset are available at https://github.com/ChyaZhang/ChatTraffic.
Paper Structure (27 sections, 11 equations, 7 figures, 3 tables)

This paper contains 27 sections, 11 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Two main challenges confronted by existing traffic prediction methods. 1) Insensitive to abnormal events. 2) Limited performance in long-term prediction.
  • Figure 2: Text-to-traffic generation via diffusion model. Given a piece of text describing the transportation system (including time and events), we present the diffusion-based ChatTraffic to generate the traffic situation. For the first time, our proposed ChatTraffic is capable of generating traffic situations (speed, congestion level, and passing time) according to the text. This enables ChatTraffic to provide predictions of how future events (road construction, unexpected accidents, unusual weather) will affect the urban transportation system, pushing this domain a considerable step forward.
  • Figure 3: Method overview. The core components of our proposed ChatTraffic are a UNet consisting of ResNet and cross-attention, and a GCN. We first populate and reshape the data to make it more suitable for use as input to a diffusion model. We use a text encoder to extract feature embeddings from text describing the traffic system. Furthermore, we introduce a GCN to achieve stronger generative consistency. The GCN takes the noisy traffic data $x_{t}$ and the adjacency matrix $A$ describing the spatial correlations of the road network as inputs to associate structural and state features of the road network. The UNet takes the textual feature embeddings and the outputs of the GCN as inputs to predict the denoised traffic data.
  • Figure 4: Illustration of adding noise to traffic data. From left to right represents the gradual addition of noise to the traffic data. From top to bottom are four different junctures of traffic data presented in the form of 'images'.
  • Figure 5: Qualitative comparison of ChatTraffic with two traditional traffic prediction methods on five specific junctures. The first to fifth rows represent five specific junctures. Best viewed in red boxes.
  • ...and 2 more figures