Table of Contents
Fetching ...

Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating

Sheng-Lan Liu, Yu-Ning Ding, Gang Yan, Si-Fan Zhang, Jin-Rong Zhang, Wen-Yue Chen, Xue-Hai Xu

TL;DR

MMFS addresses the gap in fine-grained action analysis by introducing a large-scale, multi-modality, multi-task dataset for figure skating. It combines RGB and skeleton data with independently defined spatial and temporal labels, and supports both action recognition and action quality assessment, leveraging a strong, expert-informed annotation workflow. The dataset features 11671 clips across 256 fine-grained categories (pare down to MMFS-63 for balanced evaluation) with long, variable durations and a hierarchical labeling scheme, challenging current models—especially in temporal analysis. Overall, MMFS demonstrates that skeleton-based representations better capture fine-grained motion cues and that temporal semantics pose substantial challenges, providing a rigorous benchmark to spur advances in multi-modality, fine-grained action analysis and quality assessment in sports.

Abstract

The fine-grained action analysis of the existing action datasets is challenged by insufficient action categories, low fine granularities, limited modalities, and tasks. In this paper, we propose a Multi-modality and Multi-task dataset of Figure Skating (MMFS) which was collected from the World Figure Skating Championships. MMFS, which possesses action recognition and action quality assessment, captures RGB, skeleton, and is collected the score of actions from 11671 clips with 256 categories including spatial and temporal labels. The key contributions of our dataset fall into three aspects as follows. (1) Independently spatial and temporal categories are first proposed to further explore fine-grained action recognition and quality assessment. (2) MMFS first introduces the skeleton modality for complex fine-grained action quality assessment. (3) Our multi-modality and multi-task dataset encourage more action analysis models. To benchmark our dataset, we adopt RGB-based and skeleton-based baseline methods for action recognition and action quality assessment.

Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating

TL;DR

MMFS addresses the gap in fine-grained action analysis by introducing a large-scale, multi-modality, multi-task dataset for figure skating. It combines RGB and skeleton data with independently defined spatial and temporal labels, and supports both action recognition and action quality assessment, leveraging a strong, expert-informed annotation workflow. The dataset features 11671 clips across 256 fine-grained categories (pare down to MMFS-63 for balanced evaluation) with long, variable durations and a hierarchical labeling scheme, challenging current models—especially in temporal analysis. Overall, MMFS demonstrates that skeleton-based representations better capture fine-grained motion cues and that temporal semantics pose substantial challenges, providing a rigorous benchmark to spur advances in multi-modality, fine-grained action analysis and quality assessment in sports.

Abstract

The fine-grained action analysis of the existing action datasets is challenged by insufficient action categories, low fine granularities, limited modalities, and tasks. In this paper, we propose a Multi-modality and Multi-task dataset of Figure Skating (MMFS) which was collected from the World Figure Skating Championships. MMFS, which possesses action recognition and action quality assessment, captures RGB, skeleton, and is collected the score of actions from 11671 clips with 256 categories including spatial and temporal labels. The key contributions of our dataset fall into three aspects as follows. (1) Independently spatial and temporal categories are first proposed to further explore fine-grained action recognition and quality assessment. (2) MMFS first introduces the skeleton modality for complex fine-grained action quality assessment. (3) Our multi-modality and multi-task dataset encourage more action analysis models. To benchmark our dataset, we adopt RGB-based and skeleton-based baseline methods for action recognition and action quality assessment.
Paper Structure (12 sections, 8 figures, 6 tables)

This paper contains 12 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Examples of spatio-temporal fine-grained action categories. Spatially, Lutz and Flip can be classified by $P(cl|pv)\rightarrow1$. Raising a hand in 2Flip will not change the label, which indicates $P(cl|pv)\rightarrow0$. Temporally, $P(cl|tv)\rightarrow1$ denotes different turns that will change the action label.
  • Figure 2: The process of strong annotation.
  • Figure 3: (a) Samples distribution (b) Mean duration distribution
  • Figure 4: The hierarchical label structure of the MMFS dataset. The actions of each element are fine-grained.
  • Figure 5: Fine-grained semantics. (a) Misclassification is caused by subtle spatial variation. (b) Misclassification caused by partial Spatio-temporal variation. MMFS provides information-board, including BV, GOE, and Groundtruth of classification.
  • ...and 3 more figures