Table of Contents
Fetching ...

Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification

Jimmy Lin, Junkai Li, Jiasi Gao, Weizhi Ma, Yang Liu

TL;DR

This work addresses action classification from tactile signals by introducing STAT, a transformer architecture that jointly models spatial and temporal features through dedicated embeddings and a temporal pretraining task. The model converts tactile data into tubelets, enriches them with spatial and temporal cues, and processes them with multi-layer transformers, using a CLS token for classification. Pretraining combines masked tubelet reconstruction with a temporal order discrimination objective, which together improve feature learning for spatio-temporal tactile signals. Empirical results on a public tactile dataset show STAT outperforms state-of-the-art baselines across ACC@1, ACC@3, and Macro-F1, demonstrating the practical value of explicitly incorporating spatial translation variance in tactile sensing for robust action recognition.

Abstract

Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances. In this paper, we design Spatio-Temporal Aware tactility Transformer (STAT) to utilize continuous tactile signals for action classification. We propose spatial and temporal embeddings along with a new temporal pretraining task in our model, which aims to enhance the transformer in modeling the spatio-temporal features of tactile signals. Specially, the designed temporal pretraining task is to differentiate the time order of tubelet inputs to model the temporal properties explicitly. Experimental results on a public action classification dataset demonstrate that our model outperforms state-of-the-art methods in all metrics.

Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification

TL;DR

This work addresses action classification from tactile signals by introducing STAT, a transformer architecture that jointly models spatial and temporal features through dedicated embeddings and a temporal pretraining task. The model converts tactile data into tubelets, enriches them with spatial and temporal cues, and processes them with multi-layer transformers, using a CLS token for classification. Pretraining combines masked tubelet reconstruction with a temporal order discrimination objective, which together improve feature learning for spatio-temporal tactile signals. Empirical results on a public tactile dataset show STAT outperforms state-of-the-art baselines across ACC@1, ACC@3, and Macro-F1, demonstrating the practical value of explicitly incorporating spatial translation variance in tactile sensing for robust action recognition.

Abstract

Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances. In this paper, we design Spatio-Temporal Aware tactility Transformer (STAT) to utilize continuous tactile signals for action classification. We propose spatial and temporal embeddings along with a new temporal pretraining task in our model, which aims to enhance the transformer in modeling the spatio-temporal features of tactile signals. Specially, the designed temporal pretraining task is to differentiate the time order of tubelet inputs to model the temporal properties explicitly. Experimental results on a public action classification dataset demonstrate that our model outperforms state-of-the-art methods in all metrics.
Paper Structure (27 sections, 6 equations, 8 figures, 4 tables)

This paper contains 27 sections, 6 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: An overview of action classification based on tactile signals collected by wearable electronic socks.
  • Figure 2: Empirical study of actions in a tactile dataset. Heatmaps are the averaged results of all samples collected by sensors in the left foot, and the tactile sensor of Figure 2(a) and 2(b) is located at positions (5,20) and (28,19) of the left foot, respectively.
  • Figure 3: An overview of STAT model. Spatial and temporal embeddings are designed to jointly capture both properties.
  • Figure 4: (a) Visualization of tactile signal $\mathcal{X} \in \mathbb{R}^{T \times H \times W }$. (b) Tubelet inputs, where each tubelet $\mathcal{Q} \in \mathbb{R}^{L \times P \times P}$.
  • Figure 5: Illustrations of the adopted two pretraining tasks.
  • ...and 3 more figures