Table of Contents
Fetching ...

Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition

Parham Zolfaghari, Vitor Fortes Rey, Lala Ray, Hyun Kim, Sungho Suh, Paul Lukowicz

TL;DR

This paper proposes a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model that generates sensor data directly from 3D skeleton pose sequences, and simultaneously trains the pose-to-sensor network and a human activity classifier.

Abstract

The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from users, the volume of labeled data remains insufficient compared to domains where deep learning has achieved remarkable success. Addressing this gap, in this paper, we propose a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model that generates sensor data directly from 3D skeleton pose sequences. our method simultaneously trains the pose-to-sensor network and a human activity classifier, optimizing both data reconstruction and activity recognition. Our contributions include the integration of simultaneous training, direct pose-to-sensor generation, and a comprehensive evaluation on the MM-Fit dataset. Experimental results demonstrate the superiority of our framework with significant performance improvements over baseline methods.

Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition

TL;DR

This paper proposes a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model that generates sensor data directly from 3D skeleton pose sequences, and simultaneously trains the pose-to-sensor network and a human activity classifier.

Abstract

The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from users, the volume of labeled data remains insufficient compared to domains where deep learning has achieved remarkable success. Addressing this gap, in this paper, we propose a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model that generates sensor data directly from 3D skeleton pose sequences. our method simultaneously trains the pose-to-sensor network and a human activity classifier, optimizing both data reconstruction and activity recognition. Our contributions include the integration of simultaneous training, direct pose-to-sensor generation, and a comprehensive evaluation on the MM-Fit dataset. Experimental results demonstrate the superiority of our framework with significant performance improvements over baseline methods.
Paper Structure (11 sections, 7 equations, 3 figures, 6 tables)

This paper contains 11 sections, 7 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Schematic Overview of the Proposed Method. This diagram illustrates the training architecture wherein the feature extraction and classification modules are concurrently trained with both real and synthetic IMU accelerometer data. Integral to the pipeline is a regression model that generates synthetic data and facilitates the enhanced training of the feature extraction and classification modules. The overall training process is governed by a compound weighted sum loss, optimizing the synergy between the modules for improved performance
  • Figure 2: Overview of Baseline Approaches for Activity Classification. The first baseline utilizes only real sensor data ($\mathbf{x_{\text{sensor}}}$) for classifier training. The second baseline employs a two-step approach where a regression model ($R$) first predicts synthetic sensor data ($\mathbf{\tilde{x}}_{\text{sensor}}$) from 3D joint pose sequences ($\mathbf{x}_{\text{pose}}$), which is then combined with real data to train the classifier. Both methods converge at the Activity Classifier ($C$), which outputs the activity predictions ($\mathbf{\tilde{y}}_{\text{activity}}$).
  • Figure 3: Qualitative Comparison of Regression Models. This figure contrasts the performance of a regression model trained within the end-to-end pipeline (right) against one trained independently (left). The real IMU accelerometer data is represented in blue, while the predictions from the regression model trained in the end-to-end pipeline are depicted in red. Predictions from the independently trained regression model are shown in orange. Both visualizations are based on identical time windows for a direct comparison.