Table of Contents
Fetching ...

Shared Representation for 3D Pose Estimation, Action Classification, and Progress Prediction from Tactile Signals

Isaac Han, Seoyoung Lee, Sangyeon Park, Ecehan Akan, Yiyue Luo, Joseph DelPreto, Kyung-Joong Kim

Abstract

Estimating human pose, classifying actions, and predicting movement progress are essential for human-robot interaction. While vision-based methods suffer from occlusion and privacy concerns in realistic environments, tactile sensing avoids these issues. However, prior tactile-based approaches handle each task separately, leading to suboptimal performance. In this study, we propose a Shared COnvolutional Transformer for Tactile Inference (SCOTTI) that learns a shared representation to simultaneously address three separate prediction tasks: 3D human pose estimation, action class categorization, and action completion progress estimation. To the best of our knowledge, this is the first work to explore action progress prediction using foot tactile signals from custom wireless insole sensors. This unified approach leverages the mutual benefits of multi-task learning, enabling the model to achieve improved performance across all three tasks compared to learning them independently. Experimental results demonstrate that SCOTTI outperforms existing approaches across all three tasks. Additionally, we introduce a novel dataset collected from 15 participants performing various activities and exercises, with 7 hours of total duration, across eight different activities.

Shared Representation for 3D Pose Estimation, Action Classification, and Progress Prediction from Tactile Signals

Abstract

Estimating human pose, classifying actions, and predicting movement progress are essential for human-robot interaction. While vision-based methods suffer from occlusion and privacy concerns in realistic environments, tactile sensing avoids these issues. However, prior tactile-based approaches handle each task separately, leading to suboptimal performance. In this study, we propose a Shared COnvolutional Transformer for Tactile Inference (SCOTTI) that learns a shared representation to simultaneously address three separate prediction tasks: 3D human pose estimation, action class categorization, and action completion progress estimation. To the best of our knowledge, this is the first work to explore action progress prediction using foot tactile signals from custom wireless insole sensors. This unified approach leverages the mutual benefits of multi-task learning, enabling the model to achieve improved performance across all three tasks compared to learning them independently. Experimental results demonstrate that SCOTTI outperforms existing approaches across all three tasks. Additionally, we introduce a novel dataset collected from 15 participants performing various activities and exercises, with 7 hours of total duration, across eight different activities.

Paper Structure

This paper contains 19 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of the proposed method. The proposed model simultaneously performs 3D pose estimation, action progress prediction, and action classification by learning a shared representation from foot tactile signals collected with wearable insole sensors.
  • Figure 2: Qualitative result of pose estimation and progress prediction results for lunges (left) and squats (right). The results demonstrate SCOTTI's ability to accurately predict both pose and progress for different activities.
  • Figure 3: Progress Precision-Margin (PM) curve. PM-curve with random prediction baseline (left). PM-curve with multi-task version of baselines (right).
  • Figure 4: Confusion matrix for action classification. While SCOTTI achieves high accuracy overall, misclassifications are observed between actions with similar tactile signal patterns. In particular, step-up exercises are sometimes misclassified as SideWalking or Lunge, and conversely, SideWalking samples are often confused with step-up exercises.
  • Figure 5: Analysis of SCOTTI.(a) Importance of different foot regions for SCOTTI across tasks, highlighting the central foot regions' significance. (b) t-SNE visualization of shared features based on progress values, showing structured patterns for different actions. (c) t-SNE visualization of shared features based on action labels, demonstrating well-clustered features for individual actions.