Table of Contents
Fetching ...

L-SFAN: Lightweight Spatially-focused Attention Network for Pain Behavior Detection

Jorge Ortigoso-Narro, Fernando Diaz-de-Maria, Mohammad Mahdi Dehshibi, Ana Tajadura-Jiménez

TL;DR

This work tackles protective behavior detection in chronic low back pain by introducing L-SFAN, a lightweight two-dimensional CNN with a temporal averaging pooling mechanism and a multi-head self-attention module designed to capture spatial-temporal patterns from motion capture and sEMG data. On the EmoPain dataset, L-SFAN achieves competitive performance with a small parameter footprint, outperforming several state-of-the-art architectures in key metrics such as MCC and F1, while offering better interpretability via Grad-CAM. The approach emphasizes spatial pattern extraction and efficiency, demonstrating potential for real-world, resource-constrained clinical and at-home monitoring scenarios. Overall, L-SFAN advances AI-assisted pain behavior analysis by providing a scalable, interpretable framework capable of handling multivariate biosignals with limited data.

Abstract

Chronic Low Back Pain (CLBP) afflicts millions globally, significantly impacting individuals' well-being and imposing economic burdens on healthcare systems. While artificial intelligence (AI) and deep learning offer promising avenues for analyzing pain-related behaviors to improve rehabilitation strategies, current models, including convolutional neural networks (CNNs), recurrent neural networks, and graph-based neural networks, have limitations. These approaches often focus singularly on the temporal dimension or require complex architectures to exploit spatial interrelationships within multivariate time series data. To address these limitations, we introduce \hbox{L-SFAN}, a lightweight CNN architecture incorporating 2D filters designed to meticulously capture the spatial-temporal interplay of data from motion capture and surface electromyography sensors. Our proposed model, enhanced with an oriented global pooling layer and multi-head self-attention mechanism, prioritizes critical features to better understand CLBP and achieves competitive classification accuracy. Experimental results on the EmoPain database demonstrate that our approach not only enhances performance metrics with significantly fewer parameters but also promotes model interpretability, offering valuable insights for clinicians in managing CLBP. This advancement underscores the potential of AI in transforming healthcare practices for chronic conditions like CLBP, providing a sophisticated framework for the nuanced analysis of complex biomedical data.

L-SFAN: Lightweight Spatially-focused Attention Network for Pain Behavior Detection

TL;DR

This work tackles protective behavior detection in chronic low back pain by introducing L-SFAN, a lightweight two-dimensional CNN with a temporal averaging pooling mechanism and a multi-head self-attention module designed to capture spatial-temporal patterns from motion capture and sEMG data. On the EmoPain dataset, L-SFAN achieves competitive performance with a small parameter footprint, outperforming several state-of-the-art architectures in key metrics such as MCC and F1, while offering better interpretability via Grad-CAM. The approach emphasizes spatial pattern extraction and efficiency, demonstrating potential for real-world, resource-constrained clinical and at-home monitoring scenarios. Overall, L-SFAN advances AI-assisted pain behavior analysis by providing a scalable, interpretable framework capable of handling multivariate biosignals with limited data.

Abstract

Chronic Low Back Pain (CLBP) afflicts millions globally, significantly impacting individuals' well-being and imposing economic burdens on healthcare systems. While artificial intelligence (AI) and deep learning offer promising avenues for analyzing pain-related behaviors to improve rehabilitation strategies, current models, including convolutional neural networks (CNNs), recurrent neural networks, and graph-based neural networks, have limitations. These approaches often focus singularly on the temporal dimension or require complex architectures to exploit spatial interrelationships within multivariate time series data. To address these limitations, we introduce \hbox{L-SFAN}, a lightweight CNN architecture incorporating 2D filters designed to meticulously capture the spatial-temporal interplay of data from motion capture and surface electromyography sensors. Our proposed model, enhanced with an oriented global pooling layer and multi-head self-attention mechanism, prioritizes critical features to better understand CLBP and achieves competitive classification accuracy. Experimental results on the EmoPain database demonstrate that our approach not only enhances performance metrics with significantly fewer parameters but also promotes model interpretability, offering valuable insights for clinicians in managing CLBP. This advancement underscores the potential of AI in transforming healthcare practices for chronic conditions like CLBP, providing a sophisticated framework for the nuanced analysis of complex biomedical data.

Paper Structure

This paper contains 15 sections, 6 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The schematic of the proposed L-SFAN architecture for protective behavior detection. The $180\times30$ input matrix (13 Joint Angles + 13 Joint Energies from MoCAP IMUs + 4 sEMG outputs) is processed by a CNN-TAP backbone for feature extraction. A multi-head self-attention module further refines the features, which feed into a linear layer with softmax, providing probabilities for protective behavior (P) and its complement (nP).
  • Figure 2: The convolutional block used in the feature extractor module.
  • Figure 3: Block diagram of the proposed multi-head attention module.
  • Figure 4: The schematic of the sliding window segmentation technique for data preprocessing. Each window encapsulates a 3-second segment ($\Delta t=3$ s) with a 75% overlap (hop size of 0.75 s). Within each window, 30 parameters are extracted per time step, encompassing 13 Joint Angles, 13 Joint Energies, and 4 sEMG values. Given the 60 Hz sampling rate, this translates to 180 distinct 30-dimensional vectors extracted per window. Zero-padding is applied if the 3-second window partially overlaps with a transition segment to ensure homogeneity within each window and avoid capturing transitions between activities. In this figure, we denote the transition activities as "Transition" segments.
  • Figure 5: The heatmap depicts the activation levels across the input data ($180 \times 30$), with varying intensities indicating the significance of each input element towards the model's output. The graph on the left shows the activations for 2D-CNN-TAP, while the one on the right corresponds to 2D-CNN-SAP.
  • ...and 2 more figures