TAU-R1: Visual Language Model for Traffic Anomaly Understanding

Yuqiang Lin; Kehua Chen; Sam Lockyer; Arjun Yadav; Mingxuan Sui; Shucheng Zhang; Yan Shi; Bingzhang Wang; Yuang Zhang; Markus Zarbock; Florain Stanek; Adrian Evans; Wenbin Li; Yinhai Wang; Nic Zhang

TAU-R1: Visual Language Model for Traffic Anomaly Understanding

Yuqiang Lin, Kehua Chen, Sam Lockyer, Arjun Yadav, Mingxuan Sui, Shucheng Zhang, Yan Shi, Bingzhang Wang, Yuang Zhang, Markus Zarbock, Florain Stanek, Adrian Evans, Wenbin Li, Yinhai Wang, Nic Zhang

Abstract

Traffic Anomaly Understanding (TAU) is important for traffic safety in Intelligent Transportation Systems. Recent vision-language models (VLMs) have shown strong capabilities in video understanding. However, progress on TAU remains limited due to the lack of benchmarks and task-specific methodologies. To address this limitation, we introduce Roundabout-TAU, a dataset constructed from real-world roundabout videos collected in collaboration with the City of Carmel, Indiana. The dataset contains 342 clips and is annotated with more than 2,000 question-answer pairs covering multiple aspects of traffic anomaly understanding. Building on this benchmark, we propose TAU-R1, a two-layer vision-language framework for TAU. The first layer is a lightweight anomaly classifier that performs coarse anomaly categorisation, while the second layer is a larger anomaly reasoner that generates detailed event summaries. To improve task-specific reasoning, we introduce a two-stage training strategy consisting of decomposed-QA-enhanced supervised fine-tuning followed by TAU-GRPO, a GRPO-based post-training method with TAU-specific reward functions. Experimental results show that TAU-R1 achieves strong performance on both anomaly classification and reasoning tasks while maintaining deployment efficiency. The dataset and code are available at: https://github.com/siri-rouser/TAU-R1

TAU-R1: Visual Language Model for Traffic Anomaly Understanding

Abstract

Paper Structure (46 sections, 7 equations, 12 figures, 6 tables)

This paper contains 46 sections, 7 equations, 12 figures, 6 tables.

Introduction
Related Works
Video Anomaly Dataset
Video Anomaly Understanding
Roundabout-TAU Dataset
TAU Task Definition
Classification
Summarisation
Dataset Construction and Labelling
Video Source Collection
Question-Answer Pair Labelling
Dataset statistics
Method: TAU-R1
TAU-R1 Framework
Decomposed-QA Enhanced SFT
...and 31 more sections

Figures (12)

Figure 1: Roundabout-TAU dataset statistics. (a) Overview of Roundabout-TAU, including the map of 28 camera sites and one example video. (1b) Distribution of video lengths in Roundabout-TAU. (2b) Proportion of different QA categories in Roundabout-TAU. (c) Distribution of anomaly classes.
Figure 2: TAU-R1 framework and training pipeline. Top: the two-layer framework, where a lightweight classifier filters the video stream and a larger reasoner summarises anomalous clips. Bottom: the two-stage training strategy for both modules, including decomposed-QA enhanced SFT followed by TAU-GRPO post-training.
Figure 3: Decomposed-QA enhanced SFT. To help the model learn the intermediate scene knowledge required for traffic anomaly understanding, we decompose the task into five question-answer types: environment QA, object grounding QA, anomaly time window QA, anomaly reasoning QA, and anomaly description QA. The figure shows one example for each question type.
Figure 4: Qualitative comparison on Roundabout-TAU dataset.
Figure 5: Prompt templates for classification and summarization task.
...and 7 more figures

TAU-R1: Visual Language Model for Traffic Anomaly Understanding

Abstract

TAU-R1: Visual Language Model for Traffic Anomaly Understanding

Authors

Abstract

Table of Contents

Figures (12)