Table of Contents
Fetching ...

Human Action Recognition without Human

Hirokatsu Kataoka, Kensho Hara, Yutaka Satoh

TL;DR

This paper considered whether a background sequence alone can classify human actions in current large-scale action datasets (e.g., UCF101) and concluded that some features from the background could be too strong.

Abstract

The objective of this paper is to evaluate "human action recognition without human". Motion representation is frequently discussed in human action recognition. We have examined several sophisticated options, such as dense trajectories (DT) and the two-stream convolutional neural network (CNN). However, some features from the background could be too strong, as shown in some recent studies on human action recognition. Therefore, we considered whether a background sequence alone can classify human actions in current large-scale action datasets (e.g., UCF101). In this paper, we propose a novel concept for human action analysis that is named "human action recognition without human". An experiment clearly shows the effect of a background sequence for understanding an action label.

Human Action Recognition without Human

TL;DR

This paper considered whether a background sequence alone can classify human actions in current large-scale action datasets (e.g., UCF101) and concluded that some features from the background could be too strong.

Abstract

The objective of this paper is to evaluate "human action recognition without human". Motion representation is frequently discussed in human action recognition. We have examined several sophisticated options, such as dense trajectories (DT) and the two-stream convolutional neural network (CNN). However, some features from the background could be too strong, as shown in some recent studies on human action recognition. Therefore, we considered whether a background sequence alone can classify human actions in current large-scale action datasets (e.g., UCF101). In this paper, we propose a novel concept for human action analysis that is named "human action recognition without human". An experiment clearly shows the effect of a background sequence for understanding an action label.

Paper Structure

This paper contains 15 sections, 5 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: In our pre-experiment, we revealed that humanless human action recognition can be done with a motion representation (e.g., two-stream ConvNets SimonyanNIPS2014). An undesirable scenario is arising because recent ConvNets rely heavily on a context-based representation. The problem of human action recognition with a video sequence is replaced by simple scene recognition.
  • Figure 2: Four settings to evaluate background effects and pure human motions.
  • Figure 3: Flowchart of dataset creation: We created motion-only {UCF101,HMDB51} datasets for evaluating pure human motions. They were also used to confirm that the motion representation used in existing approaches is reliable. A trial was performed to clarify the strategy for developing a sophisticated motion representation.
  • Figure 4: Motion-only {UCF101, HMDB51} datasets with a semantic segmentation. Each dataset must be created from a good trial to represent pure motion.
  • Figure 5: Area statistics of the CWOH and MO datasets: For the CWOH setting, the value given is the percentage of all pixels that are in the rectangular area, and for the MO setting, it is the percentage of all pixels that are in the segmented human area.
  • ...and 1 more figures