Table of Contents
Fetching ...

Integrating Features for Recognizing Human Activities through Optimized Parameters in Graph Convolutional Networks and Transformer Architectures

Mohammad Belal, Taimur Hassan, Abdelfatah Hassan, Nael Alsheikh, Noureldin Elhendawi, Irfan Hussain

TL;DR

This work tackles human activity recognition by integrating a Parameter-Optimized Graph Convolutional Network (PO-GCN) with a Transformer to capture both spatial-temporal skeletal structure and long-range temporal patterns. Features from the two models’ last layers are concatenated and passed to a fully connected classifier, with a combined loss $L_{ ext{Total}}$ that blends cross-entropy and mean-squared error to reduce misclassifications. Evaluations on four public datasets (HuGaDB, PKU-MMD, LARa, TUG) show PO-GCN generally outperforms the Transformer in accuracy, while the Transformer offers advantages in certain F1-score scenarios, and the feature-fusion approach yields notable gains on PKU-MMD and TUG, with mixed results on HuGaDB and LARa. The findings demonstrate that leveraging complementary strengths of graph-based skeletal modeling and sequence-transformer architectures can enhance activity recognition across diverse data sources, with potential implications for real-time human-robot interaction and assistive devices.

Abstract

Human activity recognition is a major field of study that employs computer vision, machine vision, and deep learning techniques to categorize human actions. The field of deep learning has made significant progress, with architectures that are extremely effective at capturing human dynamics. This study emphasizes the influence of feature fusion on the accuracy of activity recognition. This technique addresses the limitation of conventional models, which face difficulties in identifying activities because of their limited capacity to understand spatial and temporal features. The technique employs sensory data obtained from four publicly available datasets: HuGaDB, PKU-MMD, LARa, and TUG. The accuracy and F1-score of two deep learning models, specifically a Transformer model and a Parameter-Optimized Graph Convolutional Network (PO-GCN), were evaluated using these datasets. The feature fusion technique integrated the final layer features from both models and inputted them into a classifier. Empirical evidence demonstrates that PO-GCN outperforms standard models in activity recognition. HuGaDB demonstrated a 2.3% improvement in accuracy and a 2.2% increase in F1-score. TUG showed a 5% increase in accuracy and a 0.5% rise in F1-score. On the other hand, LARa and PKU-MMD achieved lower accuracies of 64% and 69% respectively. This indicates that the integration of features enhanced the performance of both the Transformer model and PO-GCN.

Integrating Features for Recognizing Human Activities through Optimized Parameters in Graph Convolutional Networks and Transformer Architectures

TL;DR

This work tackles human activity recognition by integrating a Parameter-Optimized Graph Convolutional Network (PO-GCN) with a Transformer to capture both spatial-temporal skeletal structure and long-range temporal patterns. Features from the two models’ last layers are concatenated and passed to a fully connected classifier, with a combined loss that blends cross-entropy and mean-squared error to reduce misclassifications. Evaluations on four public datasets (HuGaDB, PKU-MMD, LARa, TUG) show PO-GCN generally outperforms the Transformer in accuracy, while the Transformer offers advantages in certain F1-score scenarios, and the feature-fusion approach yields notable gains on PKU-MMD and TUG, with mixed results on HuGaDB and LARa. The findings demonstrate that leveraging complementary strengths of graph-based skeletal modeling and sequence-transformer architectures can enhance activity recognition across diverse data sources, with potential implications for real-time human-robot interaction and assistive devices.

Abstract

Human activity recognition is a major field of study that employs computer vision, machine vision, and deep learning techniques to categorize human actions. The field of deep learning has made significant progress, with architectures that are extremely effective at capturing human dynamics. This study emphasizes the influence of feature fusion on the accuracy of activity recognition. This technique addresses the limitation of conventional models, which face difficulties in identifying activities because of their limited capacity to understand spatial and temporal features. The technique employs sensory data obtained from four publicly available datasets: HuGaDB, PKU-MMD, LARa, and TUG. The accuracy and F1-score of two deep learning models, specifically a Transformer model and a Parameter-Optimized Graph Convolutional Network (PO-GCN), were evaluated using these datasets. The feature fusion technique integrated the final layer features from both models and inputted them into a classifier. Empirical evidence demonstrates that PO-GCN outperforms standard models in activity recognition. HuGaDB demonstrated a 2.3% improvement in accuracy and a 2.2% increase in F1-score. TUG showed a 5% increase in accuracy and a 0.5% rise in F1-score. On the other hand, LARa and PKU-MMD achieved lower accuracies of 64% and 69% respectively. This indicates that the integration of features enhanced the performance of both the Transformer model and PO-GCN.
Paper Structure (16 sections, 6 equations, 1 figure, 3 tables)

This paper contains 16 sections, 6 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: The features were extracted from the GCN and Transformer, then two processes were used which are flattened and concatenated. The classifier received the combined features as inputs.