Table of Contents
Fetching ...

Unlabeled Action Quality Assessment Based on Multi-dimensional Adaptive Constrained Dynamic Time Warping

Renguang Chen, Guolong Zheng, Xu Yang, Zhide Chen, Jiwu Shu, Wencheng Yang, Kexin Zhu, Chen Feng

TL;DR

This paper presents an unlabeled Multi-Dimensional Exercise Distance Adaptive Constrained Dynamic Time Warping (MED-ACDTW) method for action quality assessment, and introduces a new dataset called BGym to address the absence of a standardized perspective in sports class evaluations.

Abstract

The growing popularity of online sports and exercise necessitates effective methods for evaluating the quality of online exercise executions. Previous action quality assessment methods, which relied on labeled scores from motion videos, exhibited slightly lower accuracy and discriminability. This limitation hindered their rapid application to newly added exercises. To address this problem, this paper presents an unlabeled Multi-Dimensional Exercise Distance Adaptive Constrained Dynamic Time Warping (MED-ACDTW) method for action quality assessment. Our approach uses an athletic version of DTW to compare features from template and test videos, eliminating the need for score labels during training. The result shows that utilizing both 2D and 3D spatial dimensions, along with multiple human body features, improves the accuracy by 2-3% compared to using either 2D or 3D pose estimation alone. Additionally, employing MED for score calculation enhances the precision of frame distance matching, which significantly boosts overall discriminability. The adaptive constraint scheme enhances the discriminability of action quality assessment by approximately 30%. Furthermore, to address the absence of a standardized perspective in sports class evaluations, we introduce a new dataset called BGym.

Unlabeled Action Quality Assessment Based on Multi-dimensional Adaptive Constrained Dynamic Time Warping

TL;DR

This paper presents an unlabeled Multi-Dimensional Exercise Distance Adaptive Constrained Dynamic Time Warping (MED-ACDTW) method for action quality assessment, and introduces a new dataset called BGym to address the absence of a standardized perspective in sports class evaluations.

Abstract

The growing popularity of online sports and exercise necessitates effective methods for evaluating the quality of online exercise executions. Previous action quality assessment methods, which relied on labeled scores from motion videos, exhibited slightly lower accuracy and discriminability. This limitation hindered their rapid application to newly added exercises. To address this problem, this paper presents an unlabeled Multi-Dimensional Exercise Distance Adaptive Constrained Dynamic Time Warping (MED-ACDTW) method for action quality assessment. Our approach uses an athletic version of DTW to compare features from template and test videos, eliminating the need for score labels during training. The result shows that utilizing both 2D and 3D spatial dimensions, along with multiple human body features, improves the accuracy by 2-3% compared to using either 2D or 3D pose estimation alone. Additionally, employing MED for score calculation enhances the precision of frame distance matching, which significantly boosts overall discriminability. The adaptive constraint scheme enhances the discriminability of action quality assessment by approximately 30%. Furthermore, to address the absence of a standardized perspective in sports class evaluations, we introduce a new dataset called BGym.

Paper Structure

This paper contains 22 sections, 15 equations, 7 figures, 6 tables, 2 algorithms.

Figures (7)

  • Figure 1: Action matching of two videos. The upper and lower parts respectively represent the extracted 2D and 3D joint points of the template video frame and the test video frame. Action matching is performed by comparing the differences between two videos.
  • Figure 2: The overall structure and workflow of the MED-ACDTW, Module 1 illustrates the process of extracting continuous frames of 2D and 3D keypoints using MediaPipe. The extracted keypoints then enter the Human Feature Construction module, which includes four types of features. The resulting frame-based time series features are input into the third module for score and distance calculation. The computed distances are then fed into the ACDTW module with adaptive constraints to obtain the constraint matrix and score matrix. Backtracking is performed to calculate the overall score based on this matrix.
  • Figure 3: Feature presentation. (a) represents the joint points of humans identified by MediaPipe. (b) represents the Pelvic Horizontal Angle. (c) represents the Pelvic Rotation Angle.
  • Figure 4: Action matching and score matrices. In rows (1), (2), and (3), the template videos and test videos are matched with the same action. In rows (4) and (5), the template videos and test videos are matched with different actions. Column (a) represents the exercises performed by the individual. Column (b) represents the score path obtained by the MED-Greedy method. Column (c) represents the score path obtained by the MED-DTW method. Column (d) represents the score path obtained by the MED-ACDTW method. Column (e) represents the path in the MED-ACDTW method with the $T_Q$ matrix as the background. Column (f) represents the path in the MED-ACDTW method with the $T_P$ matrix as the background.
  • Figure 5: Comparison of popular methods. (1) represents lateral exercises. (2) represents jumping exercises. (3) represents abdominal and back exercises.
  • ...and 2 more figures