Table of Contents
Fetching ...

The 8th AI City Challenge

Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, Ping-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa

TL;DR

The paper presents the 8th AI City Challenge, detailing five tracks that push MTMC tracking, dense traffic-captioning, naturalistic driving action recognition, fisheye road-object detection, and helmet-rule detection, underpinned by large-scale, diverse datasets and an online evaluation framework. It introduces substantial dataset expansions (notably MTMC with 3D localization and camera matrices, WTS captions, SynDD2, FishEye8K/1K, and helmet data) and a rigorous evaluation protocol that rewards online tracking and domain-adaptive methods. Key contributions include a comprehensive benchmark, analysis of state-of-the-art approaches across tracks, and practical insights for real-world ITS and retail deployments, highlighting the shift toward online, multi-view, and language-augmented perception. The results demonstrate meaningful progress and offer guidance for future research in graph-based tracking, domain-specific vision-language modeling, and reproducible benchmarking in complex urban environments.

Abstract

The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements.

The 8th AI City Challenge

TL;DR

The paper presents the 8th AI City Challenge, detailing five tracks that push MTMC tracking, dense traffic-captioning, naturalistic driving action recognition, fisheye road-object detection, and helmet-rule detection, underpinned by large-scale, diverse datasets and an online evaluation framework. It introduces substantial dataset expansions (notably MTMC with 3D localization and camera matrices, WTS captions, SynDD2, FishEye8K/1K, and helmet data) and a rigorous evaluation protocol that rewards online tracking and domain-adaptive methods. Key contributions include a comprehensive benchmark, analysis of state-of-the-art approaches across tracks, and practical insights for real-world ITS and retail deployments, highlighting the shift toward online, multi-view, and language-augmented perception. The results demonstrate meaningful progress and offer guidance for future research in graph-based tracking, domain-specific vision-language modeling, and reproducible benchmarking in complex urban environments.

Abstract

The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements.
Paper Structure (22 sections, 4 equations, 3 figures, 5 tables)

This paper contains 22 sections, 4 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The MTMC people tracking dataset for Track 1 contains 90 subsets from 6 synthetic environments. The figure contains sampled frames with plotted labels from the 6 environments.
  • Figure 2: Overview of the WTS dataset for Track 2, providing multi-view videos with fine-grained captions focused on pedestrian perspectives.
  • Figure 3: Sample images from each of the 18 cameras with wide-angle fisheye views for Track 4.