The 8th AI City Challenge
Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, Ping-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa
TL;DR
The paper presents the 8th AI City Challenge, detailing five tracks that push MTMC tracking, dense traffic-captioning, naturalistic driving action recognition, fisheye road-object detection, and helmet-rule detection, underpinned by large-scale, diverse datasets and an online evaluation framework. It introduces substantial dataset expansions (notably MTMC with 3D localization and camera matrices, WTS captions, SynDD2, FishEye8K/1K, and helmet data) and a rigorous evaluation protocol that rewards online tracking and domain-adaptive methods. Key contributions include a comprehensive benchmark, analysis of state-of-the-art approaches across tracks, and practical insights for real-world ITS and retail deployments, highlighting the shift toward online, multi-view, and language-augmented perception. The results demonstrate meaningful progress and offer guidance for future research in graph-based tracking, domain-specific vision-language modeling, and reproducible benchmarking in complex urban environments.
Abstract
The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements.
