Table of Contents
Fetching ...

ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection

Nandakishor M, Vrinda Govind, Anuradha Puthalath, Anzy L, Swathi P S, Aswathi R, Devaprabha A R, Varsha Raj, Midhuna Krishnan K, Akhila Anilkumar T, Yamuna P

TL;DR

ForcePose presents a sensor-free pipeline that estimates forces in human–object interactions by integrating MediaPipe pose estimation with SSD MobileNet object detection, followed by temporal modeling to predict force magnitude and direction. The approach achieves a mean absolute error of 5.83 N for force magnitude and 7.4° for direction, outperforming baselines by about 27.5% and operating in real time on standard hardware. A dedicated dataset of 850 synchronized videos with force measurements supports training and evaluation, with extensive ablations showing the importance of temporal dynamics, object velocity, and joint features. The work demonstrates practical applicability across rehabilitation, ergonomics, and sports, and outlines future directions such as multi-person interactions, cross-modal sensing, and edge-optimized deployment.

Abstract

Force estimation in human-object interactions is crucial for various fields like ergonomics, physical therapy, and sports science. Traditional methods depend on specialized equipment such as force plates and sensors, which makes accurate assessments both expensive and restricted to laboratory settings. In this paper, we introduce ForcePose, a novel deep learning framework that estimates applied forces by combining human pose estimation with object detection. Our approach leverages MediaPipe for skeletal tracking and SSD MobileNet for object recognition to create a unified representation of human-object interaction. We've developed a specialized neural network that processes both spatial and temporal features to predict force magnitude and direction without needing any physical sensors. After training on our dataset of 850 annotated videos with corresponding force measurements, our model achieves a mean absolute error of 5.83 N in force magnitude and 7.4 degrees in force direction. When compared to existing computer vision approaches, our method performs 27.5% better while still offering real-time performance on standard computing hardware. ForcePose opens up new possibilities for force analysis in diverse real-world scenarios where traditional measurement tools are impractical or intrusive. This paper discusses our methodology, the dataset creation process, evaluation metrics, and potential applications across rehabilitation, ergonomics assessment, and athletic performance analysis.

ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection

TL;DR

ForcePose presents a sensor-free pipeline that estimates forces in human–object interactions by integrating MediaPipe pose estimation with SSD MobileNet object detection, followed by temporal modeling to predict force magnitude and direction. The approach achieves a mean absolute error of 5.83 N for force magnitude and 7.4° for direction, outperforming baselines by about 27.5% and operating in real time on standard hardware. A dedicated dataset of 850 synchronized videos with force measurements supports training and evaluation, with extensive ablations showing the importance of temporal dynamics, object velocity, and joint features. The work demonstrates practical applicability across rehabilitation, ergonomics, and sports, and outlines future directions such as multi-person interactions, cross-modal sensing, and edge-optimized deployment.

Abstract

Force estimation in human-object interactions is crucial for various fields like ergonomics, physical therapy, and sports science. Traditional methods depend on specialized equipment such as force plates and sensors, which makes accurate assessments both expensive and restricted to laboratory settings. In this paper, we introduce ForcePose, a novel deep learning framework that estimates applied forces by combining human pose estimation with object detection. Our approach leverages MediaPipe for skeletal tracking and SSD MobileNet for object recognition to create a unified representation of human-object interaction. We've developed a specialized neural network that processes both spatial and temporal features to predict force magnitude and direction without needing any physical sensors. After training on our dataset of 850 annotated videos with corresponding force measurements, our model achieves a mean absolute error of 5.83 N in force magnitude and 7.4 degrees in force direction. When compared to existing computer vision approaches, our method performs 27.5% better while still offering real-time performance on standard computing hardware. ForcePose opens up new possibilities for force analysis in diverse real-world scenarios where traditional measurement tools are impractical or intrusive. This paper discusses our methodology, the dataset creation process, evaluation metrics, and potential applications across rehabilitation, ergonomics assessment, and athletic performance analysis.

Paper Structure

This paper contains 31 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: ForcePose system architecture showing the integration of MediaPipe pose estimation and SSD MobileNet object detection, followed by feature extraction and force prediction networks.
  • Figure 2: Architecture of the force calculation model showing the parallel paths for magnitude and direction prediction.
  • Figure 3: Performance comparison across different interaction types, showing mean absolute error (blue bars) and direction error (orange line).