Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset

Qi Li; Tzu-Chen Chiu; Hsiang-Wei Huang; Min-Te Sun; Wei-Shinn Ku

Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset

Qi Li, Tzu-Chen Chiu, Hsiang-Wei Huang, Min-Te Sun, Wei-Shinn Ku

TL;DR

The introduction of VideoBadminton, a dataset derived from high-quality badminton footage that could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions, is introduced.

Abstract

In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications, particularly in sports analysis, where the precise decomposition of activities and the distinction of subtly different actions are crucial. Existing datasets like UCF101, HMDB51, and Kinetics have offered a diverse range of video data for various scenarios. However, there's an increasing need for fine-grained video datasets that capture detailed categorizations and nuances within broader action categories. In this paper, we introduce the VideoBadminton dataset derived from high-quality badminton footage. Through an exhaustive evaluation of leading methodologies on this dataset, this study aims to advance the field of action recognition, particularly in badminton sports. The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions. The insights gained from these evaluations are expected to catalyze further research in action comprehension, especially within sports contexts.

Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset

TL;DR

Abstract

Paper Structure (56 sections, 11 equations, 10 figures, 11 tables)

This paper contains 56 sections, 11 equations, 10 figures, 11 tables.

Introduction
Related works
Video dataset for action recognition
Fine-grained video dataset for action recognition
Action recognition models
The VideoBadminton dataset
Row data collection
Camera settings
Data preprocessing
Human/Expert labeling
Labeling process
Data post-processing
Dataset statistics and property
Entropy of Video Frames
Mean Difference of Frame-level Features
...and 41 more sections

Figures (10)

Figure 1: The workflow of creating the VideoBadminton dataset.
Figure 2: The camera setting for recording badminton actions. The camera was placed 2 meters behind the court's baseline and elevated to 4.5 meters, tilted at 30 degrees to capture the actions with minimal distortion.
Figure 3: The common types of radial distortion.
Figure 4: Video distortion correction process.
Figure 5: The user interface of $S^2$ Labeling tool.
...and 5 more figures

Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset

TL;DR

Abstract

Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset

Authors

TL;DR

Abstract

Table of Contents

Figures (10)