Table of Contents
Fetching ...

Enhancing Fitness Movement Recognition with Attention Mechanism and Pre-Trained Feature Extractors

Shanjid Hasan Nishat, Srabonti Deb, Mohiuddin Ahmed

TL;DR

A lightweight and effective framework that integrates pre-trained 2D Convolutional Neural Networks such as ResNet50, EfficientNet, and Vision Transformers with a Long Short-Term Memory (LSTM) network enhanced by spatial attention that offers a scalable and real-time-capable solution for fitness activity recognition with broader applications in vision-based health and activity monitoring.

Abstract

Fitness movement recognition, a focused subdomain of human activity recognition (HAR), plays a vital role in health monitoring, rehabilitation, and personalized fitness training by enabling automated exercise classification from video data. However, many existing deep learning approaches rely on computationally intensive 3D models, limiting their feasibility in real-time or resource-constrained settings. In this paper, we present a lightweight and effective framework that integrates pre-trained 2D Convolutional Neural Networks (CNNs) such as ResNet50, EfficientNet, and Vision Transformers (ViT) with a Long Short-Term Memory (LSTM) network enhanced by spatial attention. These models efficiently extract spatial features while the LSTM captures temporal dependencies, and the attention mechanism emphasizes informative segments. We evaluate the framework on a curated subset of the UCF101 dataset, achieving a peak accuracy of 93.34\% with the ResNet50-based configuration. Comparative results demonstrate the superiority of our approach over several state-of-the-art HAR systems. The proposed method offers a scalable and real-time-capable solution for fitness activity recognition with broader applications in vision-based health and activity monitoring.

Enhancing Fitness Movement Recognition with Attention Mechanism and Pre-Trained Feature Extractors

TL;DR

A lightweight and effective framework that integrates pre-trained 2D Convolutional Neural Networks such as ResNet50, EfficientNet, and Vision Transformers with a Long Short-Term Memory (LSTM) network enhanced by spatial attention that offers a scalable and real-time-capable solution for fitness activity recognition with broader applications in vision-based health and activity monitoring.

Abstract

Fitness movement recognition, a focused subdomain of human activity recognition (HAR), plays a vital role in health monitoring, rehabilitation, and personalized fitness training by enabling automated exercise classification from video data. However, many existing deep learning approaches rely on computationally intensive 3D models, limiting their feasibility in real-time or resource-constrained settings. In this paper, we present a lightweight and effective framework that integrates pre-trained 2D Convolutional Neural Networks (CNNs) such as ResNet50, EfficientNet, and Vision Transformers (ViT) with a Long Short-Term Memory (LSTM) network enhanced by spatial attention. These models efficiently extract spatial features while the LSTM captures temporal dependencies, and the attention mechanism emphasizes informative segments. We evaluate the framework on a curated subset of the UCF101 dataset, achieving a peak accuracy of 93.34\% with the ResNet50-based configuration. Comparative results demonstrate the superiority of our approach over several state-of-the-art HAR systems. The proposed method offers a scalable and real-time-capable solution for fitness activity recognition with broader applications in vision-based health and activity monitoring.

Paper Structure

This paper contains 17 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Architecture of Proposed Methodology
  • Figure 2: Training Performance Curves of LRCN + Attention Model
  • Figure 3: Training Performance Curves of ViT + LSTM Model
  • Figure 4: Training Performance Curves of EfficientNet + LSTM + Attention Model
  • Figure 5: Training Performance Curves of ResNet50 + LSTM + Attention Model