Table of Contents
Fetching ...

Standardizing Your Training Process for Human Activity Recognition Models: A Comprehensive Review in the Tunable Factors

Yiran Huang, Haibin Zhao, Yexu Zhou, Till Riedel, Michael Beigl

TL;DR

This work addresses reproducibility issues in wearables-based human activity recognition by systematically auditing how training procedures are described and by quantifying the impact of tunable training factors. Using a PRISMA-P guided search, it reveals widespread underreporting of optimization, scheduling, early stopping, and model-selection details, and it demonstrates the value of a control-variates analysis on the HAPT dataset. The authors then propose an integrated training procedure and validate it across five HAR benchmarks and three architectures, showing consistent macro F1 gains with LOSO cross-validation. The study provides practical guidelines and an open-source package to standardize training narratives, improving reproducibility and cross-subject generalization in WHAR research.

Abstract

In recent years, deep learning has emerged as a potent tool across a multitude of domains, leading to a surge in research pertaining to its application in the wearable human activity recognition (WHAR) domain. Despite the rapid development, concerns have been raised about the lack of standardization and consistency in the procedures used for experimental model training, which may affect the reproducibility and reliability of research results. In this paper, we provide an exhaustive review of contemporary deep learning research in the field of WHAR and collate information pertaining to the training procedure employed in various studies. Our findings suggest that a major trend is the lack of detail provided by model training protocols. Besides, to gain a clearer understanding of the impact of missing descriptions, we utilize a control variables approach to assess the impact of key tunable components (e.g., optimization techniques and early stopping criteria) on the inter-subject generalization capabilities of HAR models. With insights from the analyses, we define a novel integrated training procedure tailored to the WHAR model. Empirical results derived using five well-known \ac{whar} benchmark datasets and three classical HAR model architectures demonstrate the effectiveness of our proposed methodology: in particular, there is a significant improvement in macro F1 leave one subject out cross-validation performance.

Standardizing Your Training Process for Human Activity Recognition Models: A Comprehensive Review in the Tunable Factors

TL;DR

This work addresses reproducibility issues in wearables-based human activity recognition by systematically auditing how training procedures are described and by quantifying the impact of tunable training factors. Using a PRISMA-P guided search, it reveals widespread underreporting of optimization, scheduling, early stopping, and model-selection details, and it demonstrates the value of a control-variates analysis on the HAPT dataset. The authors then propose an integrated training procedure and validate it across five HAR benchmarks and three architectures, showing consistent macro F1 gains with LOSO cross-validation. The study provides practical guidelines and an open-source package to standardize training narratives, improving reproducibility and cross-subject generalization in WHAR research.

Abstract

In recent years, deep learning has emerged as a potent tool across a multitude of domains, leading to a surge in research pertaining to its application in the wearable human activity recognition (WHAR) domain. Despite the rapid development, concerns have been raised about the lack of standardization and consistency in the procedures used for experimental model training, which may affect the reproducibility and reliability of research results. In this paper, we provide an exhaustive review of contemporary deep learning research in the field of WHAR and collate information pertaining to the training procedure employed in various studies. Our findings suggest that a major trend is the lack of detail provided by model training protocols. Besides, to gain a clearer understanding of the impact of missing descriptions, we utilize a control variables approach to assess the impact of key tunable components (e.g., optimization techniques and early stopping criteria) on the inter-subject generalization capabilities of HAR models. With insights from the analyses, we define a novel integrated training procedure tailored to the WHAR model. Empirical results derived using five well-known \ac{whar} benchmark datasets and three classical HAR model architectures demonstrate the effectiveness of our proposed methodology: in particular, there is a significant improvement in macro F1 leave one subject out cross-validation performance.
Paper Structure (7 sections, 4 figures, 3 tables)

This paper contains 7 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Distribution on the four hyperparameters Optimizer, Learning Rate, Batch Size, and Epoch across all reviewed papers. 'none' indicates that there is no corresponding description in the paper.
  • Figure 2: Distribution of choices for scheduler hyperparameters across all reviewed papers.
  • Figure 3: Validation loss during model training in control-variates experiment.
  • Figure 4: Mean and standard deviation of the validation loss (left) and macro F1 score (right) of the models trained on HAPT dataset.