Table of Contents
Fetching ...

Self-supervised Spatio-Temporal Graph Mask-Passing Attention Network for Perceptual Importance Prediction of Multi-point Tactility

Dazhong He, Qian Liu

TL;DR

The paper addresses the challenge of compressing multi-point tactile data by predicting per-point perceptual importance. It introduces SSTGMPAN, a self-supervised spatio-temporal graph network with a novel temporal-spectral mask-passing attention and wavelet-packet preprocessing, trained via a dual-branch objective that combines cross-entropy classification with reconstruction and sparsity constraints. Empirical results on a newly collected 24-sensor tactile dataset show SSTGMPAN achieving high accuracy (92.82%), AUC (0.983), and F1 (0.939), outperforming multiple baselines and ablations that confirm the value of TSMP, SSL, and global spatial attention. The work advances tactile data compression and perceptual-aware resource management for large-area haptic interfaces, with potential impact on perceptual enhancement and efficient transmission of tactile signals.

Abstract

While visual and auditory information are prevalent in modern multimedia systems, haptic interaction, e.g., tactile and kinesthetic interaction, provides a unique form of human perception. However, multimedia technology for contact interaction is less mature than non-contact multimedia technologies and requires further development. Specialized haptic media technologies, requiring low latency and bitrates, are essential to enable haptic interaction, necessitating haptic information compression. Existing vibrotactile signal compression methods, based on the perceptual model, do not consider the characteristics of fused tactile perception at multiple spatially distributed interaction points. In fact, differences in tactile perceptual importance are not limited to conventional frequency and time domains, but also encompass differences in the spatial locations on the skin unique to tactile perception. For the most frequently used tactile information, vibrotactile texture perception, we have developed a model to predict its perceptual importance at multiple points, based on self-supervised learning and Spatio-Temporal Graph Neural Network. Current experimental results indicate that this model can effectively predict the perceptual importance of various points in multi-point tactile perception scenarios.

Self-supervised Spatio-Temporal Graph Mask-Passing Attention Network for Perceptual Importance Prediction of Multi-point Tactility

TL;DR

The paper addresses the challenge of compressing multi-point tactile data by predicting per-point perceptual importance. It introduces SSTGMPAN, a self-supervised spatio-temporal graph network with a novel temporal-spectral mask-passing attention and wavelet-packet preprocessing, trained via a dual-branch objective that combines cross-entropy classification with reconstruction and sparsity constraints. Empirical results on a newly collected 24-sensor tactile dataset show SSTGMPAN achieving high accuracy (92.82%), AUC (0.983), and F1 (0.939), outperforming multiple baselines and ablations that confirm the value of TSMP, SSL, and global spatial attention. The work advances tactile data compression and perceptual-aware resource management for large-area haptic interfaces, with potential impact on perceptual enhancement and efficient transmission of tactile signals.

Abstract

While visual and auditory information are prevalent in modern multimedia systems, haptic interaction, e.g., tactile and kinesthetic interaction, provides a unique form of human perception. However, multimedia technology for contact interaction is less mature than non-contact multimedia technologies and requires further development. Specialized haptic media technologies, requiring low latency and bitrates, are essential to enable haptic interaction, necessitating haptic information compression. Existing vibrotactile signal compression methods, based on the perceptual model, do not consider the characteristics of fused tactile perception at multiple spatially distributed interaction points. In fact, differences in tactile perceptual importance are not limited to conventional frequency and time domains, but also encompass differences in the spatial locations on the skin unique to tactile perception. For the most frequently used tactile information, vibrotactile texture perception, we have developed a model to predict its perceptual importance at multiple points, based on self-supervised learning and Spatio-Temporal Graph Neural Network. Current experimental results indicate that this model can effectively predict the perceptual importance of various points in multi-point tactile perception scenarios.
Paper Structure (30 sections, 10 equations, 7 figures, 3 tables)

This paper contains 30 sections, 10 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The overall architecture of SSTGMPAN.
  • Figure 2: The process of wavelet packet decomposition.
  • Figure 3: Temporal-spectral mask-passing. (a) Calculating the effect of node $j$ on node $i$. (b) Aggregating node neighbors' effects and updating node $i$'s representation.
  • Figure 4: (a) Dilated causal convolution. (b) The building block for decoder.
  • Figure 5: The device used for data collection. (a) The glove acquiring multi-point vibrotactile signals. (b) The device are used for human perceptual importance annotation.
  • ...and 2 more figures

Theorems & Definitions (4)

  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition