Table of Contents
Fetching ...

ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset

Zihao Chen, Hsuanyu Wu, Chi-Hsi Kung, Yi-Ting Chen, Yan-Tsung Peng

TL;DR

ATARS introduces the first top-down aerial dataset for multi-label atomic activity and temporal segmentation at intersections, providing frame-level annotations and enabling the new task of Multi-label Temporal Atomic Activity (M-TAA) Segmentation for untrimmed videos. It benchmarks a range of state-of-the-art video and object-aware models, revealing significant challenges from tiny object sizes, long-tail distributions, and intersection-specific dynamics that are not well captured by existing methods. The results highlight the need for topology-aware, high-resolution, and temporally adaptive approaches to reliably recognize and localize atomic activities in aerial traffic scenes. By releasing ATARS and its baselines, the work provides a foundation for developing robust aerial traffic understanding with practical implications for driving automation and traffic simulation in real-world intersections.

Abstract

Traffic Atomic Activity which describes traffic patterns for topological intersection dynamics is a crucial topic for the advancement of intelligent driving systems. However, existing atomic activity datasets are collected from an egocentric view, which cannot support the scenarios where traffic activities in an entire intersection are required. Moreover, existing datasets only provide video-level atomic activity annotations, which require exhausting efforts to manually trim the videos for recognition and limit their applications to untrimmed videos. To bridge this gap, we introduce the Aerial Traffic Atomic Activity Recognition and Segmentation (ATARS) dataset, the first aerial dataset designed for multi-label atomic activity analysis. We offer atomic activity labels for each frame, which accurately record the intervals for traffic activities. Moreover, we propose a novel task, Multi-label Temporal Atomic Activity Recognition, enabling the study of accurate temporal localization for atomic activity and easing the burden of manual video trimming for recognition. We conduct extensive experiments to evaluate existing state-of-the-art models on both atomic activity recognition and temporal atomic activity segmentation. The results highlight the unique challenges of our ATARS dataset, such as recognizing extremely small objects' activities. We further provide comprehensive discussion analyzing these challenges and offer valuable insights for future direction to improve recognizing atomic activity in aerial view. Our source code and dataset are available at https://github.com/magecliff96/ATARS/

ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset

TL;DR

ATARS introduces the first top-down aerial dataset for multi-label atomic activity and temporal segmentation at intersections, providing frame-level annotations and enabling the new task of Multi-label Temporal Atomic Activity (M-TAA) Segmentation for untrimmed videos. It benchmarks a range of state-of-the-art video and object-aware models, revealing significant challenges from tiny object sizes, long-tail distributions, and intersection-specific dynamics that are not well captured by existing methods. The results highlight the need for topology-aware, high-resolution, and temporally adaptive approaches to reliably recognize and localize atomic activities in aerial traffic scenes. By releasing ATARS and its baselines, the work provides a foundation for developing robust aerial traffic understanding with practical implications for driving automation and traffic simulation in real-world intersections.

Abstract

Traffic Atomic Activity which describes traffic patterns for topological intersection dynamics is a crucial topic for the advancement of intelligent driving systems. However, existing atomic activity datasets are collected from an egocentric view, which cannot support the scenarios where traffic activities in an entire intersection are required. Moreover, existing datasets only provide video-level atomic activity annotations, which require exhausting efforts to manually trim the videos for recognition and limit their applications to untrimmed videos. To bridge this gap, we introduce the Aerial Traffic Atomic Activity Recognition and Segmentation (ATARS) dataset, the first aerial dataset designed for multi-label atomic activity analysis. We offer atomic activity labels for each frame, which accurately record the intervals for traffic activities. Moreover, we propose a novel task, Multi-label Temporal Atomic Activity Recognition, enabling the study of accurate temporal localization for atomic activity and easing the burden of manual video trimming for recognition. We conduct extensive experiments to evaluate existing state-of-the-art models on both atomic activity recognition and temporal atomic activity segmentation. The results highlight the unique challenges of our ATARS dataset, such as recognizing extremely small objects' activities. We further provide comprehensive discussion analyzing these challenges and offer valuable insights for future direction to improve recognizing atomic activity in aerial view. Our source code and dataset are available at https://github.com/magecliff96/ATARS/

Paper Structure

This paper contains 19 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Illustration of the ATARS dataset. (a) Atomic Activity definition showing the names of each roadway (z1, z2, z3, and z4), four corners (c1, c2, c3, and c4), and examples of atomic activity labels. The region boxed in purple is the intersection and all traffic participants moving inside are annotated. Labels containing v and p represent a singular vehicle or pedestrian, respectively. The additional '+' represents multiple vehicles or pedestrians in the intersection. (b) Chronologically ordered frames sampled from a video, highlight atomic activity and the challenge of detecting small objects, such as pedestrians, due to the high-altitude top-down perspective. (c) Visualization of atomic activity labels annotated at frame granularity, with dashed lines marking the location of the presented frames in (b).
  • Figure 2: The distribution of atomic activity classes in the ATARS dataset. Four distinct color groups represent the traffic participants v, v+, p, and p+, while different shades for v and v+ indicate motion directions: the lightest color denotes right turns, the medium shade denotes going straight, and the darkest shade denotes left turns. Due to the dataset’s long-tail distribution, current models struggle to capture pedestrian patterns. However, this real-world data offers an ideal benchmark for evaluating and improving model robustness in aerial intersection scenarios.
  • Figure 3: Qualitative Visualization of various methods in multi-label temporal atomic activity segmentation farha2019msli2020msyi2021asformer. The dark indigo segment represents the ground truth, and the purple segment denotes correct predictions that align with the ground truth.