Table of Contents
Fetching ...

BST: Badminton Stroke-type Transformer for Skeleton-based Action Recognition in Racket Sports

Jing-Yuan Chang

Abstract

Badminton, known for having the fastest ball speeds among all sports, presents significant challenges to the field of computer vision, including player identification, court line detection, shuttlecock trajectory tracking, and player stroke-type classification. In this paper, we introduce a novel video clipping strategy to extract frames of each player's racket swing in a badminton broadcast match. These clipped frames are then processed by three existing models: one for Human Pose Estimation to obtain human skeletal joints, another for shuttlecock trajectory tracking, and the other for court line detection to determine player positions on the court. Leveraging these data as inputs, we propose Badminton Stroke-type Transformer (BST) to classify player stroke-types in singles. To the best of our knowledge, experimental results demonstrate that our method outperforms the previous state-of-the-art on the largest publicly available badminton video dataset (ShuttleSet), another badminton dataset (BadmintonDB), and a tennis dataset (TenniSet). These results suggest that effectively leveraging ball trajectory is a promising direction for action recognition in racket sports.

BST: Badminton Stroke-type Transformer for Skeleton-based Action Recognition in Racket Sports

Abstract

Badminton, known for having the fastest ball speeds among all sports, presents significant challenges to the field of computer vision, including player identification, court line detection, shuttlecock trajectory tracking, and player stroke-type classification. In this paper, we introduce a novel video clipping strategy to extract frames of each player's racket swing in a badminton broadcast match. These clipped frames are then processed by three existing models: one for Human Pose Estimation to obtain human skeletal joints, another for shuttlecock trajectory tracking, and the other for court line detection to determine player positions on the court. Leveraging these data as inputs, we propose Badminton Stroke-type Transformer (BST) to classify player stroke-types in singles. To the best of our knowledge, experimental results demonstrate that our method outperforms the previous state-of-the-art on the largest publicly available badminton video dataset (ShuttleSet), another badminton dataset (BadmintonDB), and a tennis dataset (TenniSet). These results suggest that effectively leveraging ball trajectory is a promising direction for action recognition in racket sports.

Paper Structure

This paper contains 26 sections, 7 equations, 5 figures, 14 tables.

Figures (5)

  • Figure 1: The idea of video clipping strategies. The black player is the target player, and the red player is his/her opponent.
  • Figure 2: Architecture of BST. (In practice: Top player is blue, and Bottom player is green.)
  • Figure A: Loss Curves of BST-CG-AP, TemPose-TF and TemPose-TF*.
  • Figure B: Normalized confusion matrices of BST-0 on ShuttleSet (35 classes). The sum of the elements in each column in the left matrix is equal to 1, and the sum of the elements in each row in the right matrix is equal to 1.
  • Figure C: Example of a 2D pose vs. 3D pose in a badminton video frame. The ninth joint represents the human nose. The red arrows indicate the directions the bottom player was facing. We can see the bottom player was facing her opponent in fact, but the 3D pose exhibits a clear error in the facing direction. We suspect that this 3D model will often assume that if both forearms are pointing roughly forward, it will lead the model to assume that the face is also pointing in that direction. (The visualizations in (b), (c), and (d) are generated by the Mayavi Mayavi python package.)