Table of Contents
Fetching ...

FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs

Qi Qiu, Tao Zhu, Furong Duan, Kevin I-Kai Wang, Liming Chen, Mingxing Nie, Mingxing Nie

TL;DR

This work proposes a novel approach that extracts a global view representation based on the characteristics of IMU data, effectively alleviating the data distribution discrepancies induced by wearing styles and demonstrates that the proposed algorithm outperforms the current state-of-the-art methods in cross-user HAR.

Abstract

Inertial Measurement Unit (IMU) sensors are widely employed for Human Activity Recognition (HAR) due to their portability, energy efficiency, and growing research interest. However, a significant challenge for IMU-HAR models is achieving robust generalization performance across diverse users. This limitation stems from substantial variations in data distribution among individual users. One primary reason for this distribution disparity lies in the representation of IMU sensor data in the local coordinate system, which is susceptible to subtle user variations during IMU wearing. To address this issue, we propose a novel approach that extracts a global view representation based on the characteristics of IMU data, effectively alleviating the data distribution discrepancies induced by wearing styles. To validate the efficacy of the global view representation, we fed both global and local view data into model for experiments. The results demonstrate that global view data significantly outperforms local view data in cross-user experiments. Furthermore, we propose a Multi-view Supervised Network (MVFNet) based on Shuffling to effectively fuse local view and global view data. It supervises the feature extraction of each view through view division and view shuffling, so as to avoid the model ignoring important features as much as possible. Extensive experiments conducted on OPPORTUNITY and PAMAP2 datasets demonstrate that the proposed algorithm outperforms the current state-of-the-art methods in cross-user HAR.

FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs

TL;DR

This work proposes a novel approach that extracts a global view representation based on the characteristics of IMU data, effectively alleviating the data distribution discrepancies induced by wearing styles and demonstrates that the proposed algorithm outperforms the current state-of-the-art methods in cross-user HAR.

Abstract

Inertial Measurement Unit (IMU) sensors are widely employed for Human Activity Recognition (HAR) due to their portability, energy efficiency, and growing research interest. However, a significant challenge for IMU-HAR models is achieving robust generalization performance across diverse users. This limitation stems from substantial variations in data distribution among individual users. One primary reason for this distribution disparity lies in the representation of IMU sensor data in the local coordinate system, which is susceptible to subtle user variations during IMU wearing. To address this issue, we propose a novel approach that extracts a global view representation based on the characteristics of IMU data, effectively alleviating the data distribution discrepancies induced by wearing styles. To validate the efficacy of the global view representation, we fed both global and local view data into model for experiments. The results demonstrate that global view data significantly outperforms local view data in cross-user experiments. Furthermore, we propose a Multi-view Supervised Network (MVFNet) based on Shuffling to effectively fuse local view and global view data. It supervises the feature extraction of each view through view division and view shuffling, so as to avoid the model ignoring important features as much as possible. Extensive experiments conducted on OPPORTUNITY and PAMAP2 datasets demonstrate that the proposed algorithm outperforms the current state-of-the-art methods in cross-user HAR.

Paper Structure

This paper contains 20 sections, 13 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The NED coordinate representation can relieve the distribution differences caused by the wearing style. The left part of the image shows diagrams of two different IMU wearing styles. The line graph in the middle part shows the data of two IMU sensors in the local coordinate system, and the right part shows the data representation in the NED coordinate system. It can be seen that the data of the two IMUs is more closely represented in the latter.
  • Figure 2: Two coordinate schemata and pose extraction schemes. The figure on the left shows the schematic diagrams of NED coordinate representation and local coordinate representation. The figure on the right shows the specific M&C process. The original data is composed of a time series $S$ that is input into the Mahony algorithm to obtain the attitude representation $\boldsymbol{q^i}$ of the IMU for each moment. Then, the transformation matrix $M^i$ is obtained based on $\boldsymbol{q^i}$, and then the accelerometer, gyroscope, and magnetometer data in $\boldsymbol{s^i}$ are transformed from the local coordinate system to the NED coordinate system using $M^i$. However, due to the inherent characteristics of the Mahony algorithm, the attitudes in the first part of the obtained attitude sequence Q are inaccurate, and we will not use them in the experiment, which will result in the loss of about 1 second of data.
  • Figure 3: MVFNet and its training process. MVFNet consists of three parts: Backbone, MVF-Layer, and Voting Net. For the same batch of data, the training of MVFNet includes two different processes. In the first process, the data in the current batch is shuffled based on perspective and then used to train the Backbone and MVF-Layer. In the second process, we use the unshuffled data to train the Voting-Net with the Backbone and MVF-Layer frozen.
  • Figure 4: Flow: The M&C method and MVFNet are combined to obtain a method that can fully utilize the advantages of both local and global views
  • Figure 5: Confusion matrix obtained by training with different views
  • ...and 1 more figures