Beyond Isolated Frames: Enhancing Sensor-Based Human Activity Recognition through Intra- and Inter-Frame Attention
Shuai Shao, Yu Guan, Victor Sanchez
TL;DR
This work addresses the challenge of capturing long-range temporal dynamics in sensor-based HAR beyond frame-by-frame analysis. It introduces an intra- and inter-frame attention model augmented with frame-level positional encoding, a time-sequential batch learning strategy, and a combined loss to improve robustness. Empirical results across four public HAR datasets show clear gains over CNN, ConvLSTM, Transformer, and attention-based baselines, especially on datasets with complex temporal patterns. The approach demonstrates the value of leveraging intra- and inter-frame relationships to achieve more accurate and context-aware activity recognition, with practical implications for healthcare, sports, and ambient intelligence.
Abstract
Human Activity Recognition (HAR) has become increasingly popular with ubiquitous computing, driven by the popularity of wearable sensors in fields like healthcare and sports. While Convolutional Neural Networks (ConvNets) have significantly contributed to HAR, they often adopt a frame-by-frame analysis, concentrating on individual frames and potentially overlooking the broader temporal dynamics inherent in human activities. To address this, we propose the intra- and inter-frame attention model. This model captures both the nuances within individual frames and the broader contextual relationships across multiple frames, offering a comprehensive perspective on sequential data. We further enrich the temporal understanding by proposing a novel time-sequential batch learning strategy. This learning strategy preserves the chronological sequence of time-series data within each batch, ensuring the continuity and integrity of temporal patterns in sensor-based HAR.
