PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding
Chunhui Liu, Yueyu Hu, Yanghao Li, Sijie Song, Jiaying Liu
TL;DR
The paper addresses the shortage of large-scale benchmarks for continuous 3D action detection by introducing PKU-MMD, a dataset of 1076 long videos across 51 actions, 66 subjects, and three views, with RGB, depth, infrared, and skeleton modalities. It also proposes a novel 2D-AP evaluation protocol to assess detection performance considering both interval overlap and confidence, and provides extensive baseline experiments across skeleton- and multi-modality representations. The results highlight challenges in current methods for long, untrimmed videos and demonstrate the value of multi-modality data for robust temporal localization. Overall, PKU-MMD aims to accelerate progress in continuous action detection and multi-modal 3D activity understanding.
Abstract
Despite the fact that many 3D human activity benchmarks being proposed, most existing action datasets focus on the action recognition tasks for the segmented videos. There is a lack of standard large-scale benchmarks, especially for current popular data-hungry deep learning based methods. In this paper, we introduce a new large scale benchmark (PKU-MMD) for continuous multi-modality 3D human action understanding and cover a wide range of complex human activities with well annotated information. PKU-MMD contains 1076 long video sequences in 51 action categories, performed by 66 subjects in three camera views. It contains almost 20,000 action instances and 5.4 million frames in total. Our dataset also provides multi-modality data sources, including RGB, depth, Infrared Radiation and Skeleton. With different modalities, we conduct extensive experiments on our dataset in terms of two scenarios and evaluate different methods by various metrics, including a new proposed evaluation protocol 2D-AP. We believe this large-scale dataset will benefit future researches on action detection for the community.
