Table of Contents
Fetching ...

Towards Continual Egocentric Activity Recognition: A Multi-modal Egocentric Activity Dataset for Continual Learning

Linfeng Xu, Qingbo Wu, Lili Pan, Fanman Meng, Hongliang Li, Chiyuan He, Hanxin Wang, Shaoxu Cheng, Yu Dai

TL;DR

A multi-modal egocentric activity dataset for continual activity learning named UESTC-MMEA-CL is proposed and the results of egocentric activity recognition of three modalities separately and jointly on a base network architecture are reported.

Abstract

With the rapid development of wearable cameras, a massive collection of egocentric video for first-person visual perception becomes available. Using egocentric videos to predict first-person activity faces many challenges, including limited field of view, occlusions, and unstable motions. Observing that sensor data from wearable devices facilitates human activity recognition, multi-modal activity recognition is attracting increasing attention. However, the deficiency of related dataset hinders the development of multi-modal deep learning for egocentric activity recognition. Nowadays, deep learning in real world has led to a focus on continual learning that often suffers from catastrophic forgetting. But the catastrophic forgetting problem for egocentric activity recognition, especially in the context of multiple modalities, remains unexplored due to unavailability of dataset. In order to assist this research, we present a multi-modal egocentric activity dataset for continual learning named UESTC-MMEA-CL, which is collected by self-developed glasses integrating a first-person camera and wearable sensors. It contains synchronized data of videos, accelerometers, and gyroscopes, for 32 types of daily activities, performed by 10 participants. Its class types and scale are compared with other publicly available datasets. The statistical analysis of the sensor data is given to show the auxiliary effects for different behaviors. And results of egocentric activity recognition are reported when using separately, and jointly, three modalities: RGB, acceleration, and gyroscope, on a base network architecture. To explore the catastrophic forgetting in continual learning tasks, four baseline methods are extensively evaluated with different multi-modal combinations. We hope the UESTC-MMEA-CL can promote future studies on continual learning for first-person activity recognition in wearable applications.

Towards Continual Egocentric Activity Recognition: A Multi-modal Egocentric Activity Dataset for Continual Learning

TL;DR

A multi-modal egocentric activity dataset for continual activity learning named UESTC-MMEA-CL is proposed and the results of egocentric activity recognition of three modalities separately and jointly on a base network architecture are reported.

Abstract

With the rapid development of wearable cameras, a massive collection of egocentric video for first-person visual perception becomes available. Using egocentric videos to predict first-person activity faces many challenges, including limited field of view, occlusions, and unstable motions. Observing that sensor data from wearable devices facilitates human activity recognition, multi-modal activity recognition is attracting increasing attention. However, the deficiency of related dataset hinders the development of multi-modal deep learning for egocentric activity recognition. Nowadays, deep learning in real world has led to a focus on continual learning that often suffers from catastrophic forgetting. But the catastrophic forgetting problem for egocentric activity recognition, especially in the context of multiple modalities, remains unexplored due to unavailability of dataset. In order to assist this research, we present a multi-modal egocentric activity dataset for continual learning named UESTC-MMEA-CL, which is collected by self-developed glasses integrating a first-person camera and wearable sensors. It contains synchronized data of videos, accelerometers, and gyroscopes, for 32 types of daily activities, performed by 10 participants. Its class types and scale are compared with other publicly available datasets. The statistical analysis of the sensor data is given to show the auxiliary effects for different behaviors. And results of egocentric activity recognition are reported when using separately, and jointly, three modalities: RGB, acceleration, and gyroscope, on a base network architecture. To explore the catastrophic forgetting in continual learning tasks, four baseline methods are extensively evaluated with different multi-modal combinations. We hope the UESTC-MMEA-CL can promote future studies on continual learning for first-person activity recognition in wearable applications.
Paper Structure (22 sections, 5 equations, 9 figures, 4 tables)

This paper contains 22 sections, 5 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Continual egocentric activity recognition with multi modalities: Video stream, acceleration data(green) and gyroscope data(purple).
  • Figure 2: The device for data collection. (a) Our developed Kuaiyan Vision Smart Glasses. (b) The mainboard of the glasses.
  • Figure 3: A sample of activities “drinking”, which consists of the synchronized video stream, acceleration, and gyroscope sensor data.
  • Figure 4: Statistics of sensor data. (a) STD distributions of acceleration for all activity classes. The relative motion intensity of the activities increase sequentially from the leftmost column to the right, which are divided into four different levels according to the median STD. (b) STD distributions of gyroscope for each activity. (c) Scatter plot of the STD distributions of acceleration and gyroscope (Correlation coefficient $r = 0.78$ on all samples).
  • Figure 5: Base architecture of multi-modal egocentric activity recognition. The number of TBWs $T$ is set to 8.
  • ...and 4 more figures