Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition

Haoyu Xie; Haoxuan Li; Chunyuan Zheng; Haonan Yuan; Guorui Liao; Jun Liao; Li Liu

Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition

Haoyu Xie, Haoxuan Li, Chunyuan Zheng, Haonan Yuan, Guorui Liao, Jun Liao, Li Liu

TL;DR

The paper tackles wearable human activity recognition by disentangling intra-sensor and inter-sensor spatio-temporal relationships. It introduces DecomposeWHAR, a two-phase framework: Modality-Aware Signal Decomposition to preserve variable-specific temporal features via Modality-Specific Embedding and Local Temporal Extraction, and Hierarchical Interaction Fusion to fuse features through Cross-Channel, Cross-Variable, Global Temporal Aggregation (Mamba-based), and Cross-Sensor Interaction with self-attention. The approach achieves state-of-the-art Macro-F1 and accuracy on Opportunity, Realdisp, and Skoda while maintaining high efficiency through Depth-Wise and Point-Wise convolutions and a selective SSM-based temporal model. The results demonstrate the value of sensor-aware decomposition and dynamic inter-sensor fusion for robust WHAR, with practical implications for deployment on wearable devices. The work provides a scalable framework that can generalize to other multi-sensor time-series classification tasks and highlights the importance of directional inter-sensor relationships in recognition systems.

Abstract

Wearable Human Activity Recognition (WHAR) is a prominent research area within ubiquitous computing. Multi-sensor synchronous measurement has proven to be more effective for WHAR than using a single sensor. However, existing WHAR methods use shared convolutional kernels for indiscriminate temporal feature extraction across each sensor variable, which fails to effectively capture spatio-temporal relationships of intra-sensor and inter-sensor variables. We propose the DecomposeWHAR model consisting of a decomposition phase and a fusion phase to better model the relationships between modality variables. The decomposition creates high-dimensional representations of each intra-sensor variable through the improved Depth Separable Convolution to capture local temporal features while preserving their unique characteristics. The fusion phase begins by capturing relationships between intra-sensor variables and fusing their features at both the channel and variable levels. Long-range temporal dependencies are modeled using the State Space Model (SSM), and later cross-sensor interactions are dynamically captured through a self-attention mechanism, highlighting inter-sensor spatial correlations. Our model demonstrates superior performance on three widely used WHAR datasets, significantly outperforming state-of-the-art models while maintaining acceptable computational efficiency.

Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition

TL;DR

Abstract

Paper Structure (31 sections, 11 equations, 5 figures, 2 tables)

This paper contains 31 sections, 11 equations, 5 figures, 2 tables.

Introduction
Related Works
Temporal Feature Extraction for WHAR
Modeling Intra- and Inter-Sensor Interaction
Preliminaries
Problem Formalization
Depth-Wise and Point-Wise Convolution
Our Model
Modality-Aware Signal Decomposition
Modality-Specific Embedding (MSE).
Local Temporal Extraction (LTE).
Hierarchical Interaction Fusion
Cross-Channel Fusion (CCF).
Cross-Variable Fusion (CVF).
Global Temporal Aggregation (GTA).
...and 16 more sections

Figures (5)

Figure 1: Intra- and Inter-Sensor Variables in WHAR.
Figure 2: Architecture of Our Model. GAP represents Global Average Pooling, and FC represents Fully Connected layers.
Figure 3: Parameters size and computation efficiency of the Opportunity dataset. The Pytorch model is deployed to the Xiaomi Watch XMWT01 on Wearos 2.41, and inference time and energy consumption are measured. FLOPs are not shown in the figure due to significant discrepancies.
Figure 4: (a) Nodes labelled with "RUA", "RLA", "BK", "LUA", and "LLA" represent the sensors placed on the right up arm, right lower arm, back, left up arm and left low arm. (b) Attention Scores of CSI. (c) Inter-Sensor Correlations of DynamicWHAR. (d) Inter-Sensor Correlations of Ours. The intensity of the line colors represents the strength of the correlations. Only the prominent lines are marked as directed.
Figure 5: Parameter Analysis.

Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition

TL;DR

Abstract

Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (5)