Table of Contents
Fetching ...

Beyond Isolated Frames: Enhancing Sensor-Based Human Activity Recognition through Intra- and Inter-Frame Attention

Shuai Shao, Yu Guan, Victor Sanchez

TL;DR

This work addresses the challenge of capturing long-range temporal dynamics in sensor-based HAR beyond frame-by-frame analysis. It introduces an intra- and inter-frame attention model augmented with frame-level positional encoding, a time-sequential batch learning strategy, and a combined loss to improve robustness. Empirical results across four public HAR datasets show clear gains over CNN, ConvLSTM, Transformer, and attention-based baselines, especially on datasets with complex temporal patterns. The approach demonstrates the value of leveraging intra- and inter-frame relationships to achieve more accurate and context-aware activity recognition, with practical implications for healthcare, sports, and ambient intelligence.

Abstract

Human Activity Recognition (HAR) has become increasingly popular with ubiquitous computing, driven by the popularity of wearable sensors in fields like healthcare and sports. While Convolutional Neural Networks (ConvNets) have significantly contributed to HAR, they often adopt a frame-by-frame analysis, concentrating on individual frames and potentially overlooking the broader temporal dynamics inherent in human activities. To address this, we propose the intra- and inter-frame attention model. This model captures both the nuances within individual frames and the broader contextual relationships across multiple frames, offering a comprehensive perspective on sequential data. We further enrich the temporal understanding by proposing a novel time-sequential batch learning strategy. This learning strategy preserves the chronological sequence of time-series data within each batch, ensuring the continuity and integrity of temporal patterns in sensor-based HAR.

Beyond Isolated Frames: Enhancing Sensor-Based Human Activity Recognition through Intra- and Inter-Frame Attention

TL;DR

This work addresses the challenge of capturing long-range temporal dynamics in sensor-based HAR beyond frame-by-frame analysis. It introduces an intra- and inter-frame attention model augmented with frame-level positional encoding, a time-sequential batch learning strategy, and a combined loss to improve robustness. Empirical results across four public HAR datasets show clear gains over CNN, ConvLSTM, Transformer, and attention-based baselines, especially on datasets with complex temporal patterns. The approach demonstrates the value of leveraging intra- and inter-frame relationships to achieve more accurate and context-aware activity recognition, with practical implications for healthcare, sports, and ambient intelligence.

Abstract

Human Activity Recognition (HAR) has become increasingly popular with ubiquitous computing, driven by the popularity of wearable sensors in fields like healthcare and sports. While Convolutional Neural Networks (ConvNets) have significantly contributed to HAR, they often adopt a frame-by-frame analysis, concentrating on individual frames and potentially overlooking the broader temporal dynamics inherent in human activities. To address this, we propose the intra- and inter-frame attention model. This model captures both the nuances within individual frames and the broader contextual relationships across multiple frames, offering a comprehensive perspective on sequential data. We further enrich the temporal understanding by proposing a novel time-sequential batch learning strategy. This learning strategy preserves the chronological sequence of time-series data within each batch, ensuring the continuity and integrity of temporal patterns in sensor-based HAR.
Paper Structure (19 sections, 7 equations, 6 figures, 2 tables)

This paper contains 19 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Comparative overview of the traditional method vs. our proposed method.
  • Figure 2: An overview of our proposed Intra- and Inter-Frame Attention Model.
  • Figure 3: Comparison of Shuffle Learning vs. Time-Sequential Batch Learning (Varying shades of colour indicate the progression of time, best view in colour).
  • Figure 4: The overview of the mean duration of each activity from OPP and PAMAP datasets, complemented by standard deviations, underscoring the central tendency and variability of activity duration. Here, OPP duration is expressed in seconds, while PAMAP2 duration is in minutes.
  • Figure 5: The overview of the activity patterns from OPP and PAMAP datasets, emphasizing the temporal details and data characteristics, with unique colours denoting different data dimensions.
  • ...and 1 more figures