Table of Contents
Fetching ...

PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding

Chunhui Liu, Yueyu Hu, Yanghao Li, Sijie Song, Jiaying Liu

TL;DR

The paper addresses the shortage of large-scale benchmarks for continuous 3D action detection by introducing PKU-MMD, a dataset of 1076 long videos across 51 actions, 66 subjects, and three views, with RGB, depth, infrared, and skeleton modalities. It also proposes a novel 2D-AP evaluation protocol to assess detection performance considering both interval overlap and confidence, and provides extensive baseline experiments across skeleton- and multi-modality representations. The results highlight challenges in current methods for long, untrimmed videos and demonstrate the value of multi-modality data for robust temporal localization. Overall, PKU-MMD aims to accelerate progress in continuous action detection and multi-modal 3D activity understanding.

Abstract

Despite the fact that many 3D human activity benchmarks being proposed, most existing action datasets focus on the action recognition tasks for the segmented videos. There is a lack of standard large-scale benchmarks, especially for current popular data-hungry deep learning based methods. In this paper, we introduce a new large scale benchmark (PKU-MMD) for continuous multi-modality 3D human action understanding and cover a wide range of complex human activities with well annotated information. PKU-MMD contains 1076 long video sequences in 51 action categories, performed by 66 subjects in three camera views. It contains almost 20,000 action instances and 5.4 million frames in total. Our dataset also provides multi-modality data sources, including RGB, depth, Infrared Radiation and Skeleton. With different modalities, we conduct extensive experiments on our dataset in terms of two scenarios and evaluate different methods by various metrics, including a new proposed evaluation protocol 2D-AP. We believe this large-scale dataset will benefit future researches on action detection for the community.

PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding

TL;DR

The paper addresses the shortage of large-scale benchmarks for continuous 3D action detection by introducing PKU-MMD, a dataset of 1076 long videos across 51 actions, 66 subjects, and three views, with RGB, depth, infrared, and skeleton modalities. It also proposes a novel 2D-AP evaluation protocol to assess detection performance considering both interval overlap and confidence, and provides extensive baseline experiments across skeleton- and multi-modality representations. The results highlight challenges in current methods for long, untrimmed videos and demonstrate the value of multi-modality data for robust temporal localization. Overall, PKU-MMD aims to accelerate progress in continuous action detection and multi-modal 3D activity understanding.

Abstract

Despite the fact that many 3D human activity benchmarks being proposed, most existing action datasets focus on the action recognition tasks for the segmented videos. There is a lack of standard large-scale benchmarks, especially for current popular data-hungry deep learning based methods. In this paper, we introduce a new large scale benchmark (PKU-MMD) for continuous multi-modality 3D human action understanding and cover a wide range of complex human activities with well annotated information. PKU-MMD contains 1076 long video sequences in 51 action categories, performed by 66 subjects in three camera views. It contains almost 20,000 action instances and 5.4 million frames in total. Our dataset also provides multi-modality data sources, including RGB, depth, Infrared Radiation and Skeleton. With different modalities, we conduct extensive experiments on our dataset in terms of two scenarios and evaluate different methods by various metrics, including a new proposed evaluation protocol 2D-AP. We believe this large-scale dataset will benefit future researches on action detection for the community.

Paper Structure

This paper contains 19 sections, 6 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: PKU Multi-Modalilty Dataset is a large-scale multi-modalities action detection dataset. This dataset contains 51 action categories, performed by 66 distinct subjects in 3 camera views.
  • Figure 2: Different Precision-Recall curves (overlapping ratio $\theta$ is set to 0.2) under different settings with different window size and stride. $L$ stands for the length of sliding windows.
  • Figure 3: Sample frames from PKU-MMD. The top figure shows an example of continuous action detection in multi-modality, and about 20 action instances can be found within one sequences. The bottom figure depicts the diversity in categories, subjects and camera viewpoints.