Table of Contents
Fetching ...

FineSkiing: A Fine-grained Benchmark for Skiing Action Quality Assessment

Yongji Zhang, Siqi Li, Yue Gao, Yu Jiang

TL;DR

This work introduces FineSkiing, the first fine-grained AQA dataset for aerial skiing with stage-wise scores and deduction annotations, enabling more interpretable and reliable action scoring. It also proposes JudgeMind, a stage-decoupled framework that segments each video into air, form, and landing, applies stage-specific feature extraction, and leverages a knowledge-based decoder to fuse deduction knowledge with action codes to predict stage scores. Experiments show state-of-the-art performance on FineSkiing and competitive results on FineDiving, with ablations highlighting the contributions of temporal segmentation, foreground/global features, and deduction knowledge. The combination of a rich, standards-aligned dataset and a stage-aware, knowledge-guided model advances robust, interpretable AQA suitable for professional judging contexts and broader temporal action assessment tasks.

Abstract

Action Quality Assessment (AQA) aims to evaluate and score sports actions, which has attracted widespread interest in recent years. Existing AQA methods primarily predict scores based on features extracted from the entire video, resulting in limited interpretability and reliability. Meanwhile, existing AQA datasets also lack fine-grained annotations for action scores, especially for deduction items and sub-score annotations. In this paper, we construct the first AQA dataset containing fine-grained sub-score and deduction annotations for aerial skiing, which will be released as a new benchmark. For the technical challenges, we propose a novel AQA method, named JudgeMind, which significantly enhances performance and reliability by simulating the judgment and scoring mindset of professional referees. Our method segments the input action video into different stages and scores each stage to enhance accuracy. Then, we propose a stage-aware feature enhancement and fusion module to boost the perception of stage-specific key regions and enhance the robustness to visual changes caused by frequent camera viewpoints switching. In addition, we propose a knowledge-based grade-aware decoder to incorporate possible deduction items as prior knowledge to predict more accurate and reliable scores. Experimental results demonstrate that our method achieves state-of-the-art performance.

FineSkiing: A Fine-grained Benchmark for Skiing Action Quality Assessment

TL;DR

This work introduces FineSkiing, the first fine-grained AQA dataset for aerial skiing with stage-wise scores and deduction annotations, enabling more interpretable and reliable action scoring. It also proposes JudgeMind, a stage-decoupled framework that segments each video into air, form, and landing, applies stage-specific feature extraction, and leverages a knowledge-based decoder to fuse deduction knowledge with action codes to predict stage scores. Experiments show state-of-the-art performance on FineSkiing and competitive results on FineDiving, with ablations highlighting the contributions of temporal segmentation, foreground/global features, and deduction knowledge. The combination of a rich, standards-aligned dataset and a stage-aware, knowledge-guided model advances robust, interpretable AQA suitable for professional judging contexts and broader temporal action assessment tasks.

Abstract

Action Quality Assessment (AQA) aims to evaluate and score sports actions, which has attracted widespread interest in recent years. Existing AQA methods primarily predict scores based on features extracted from the entire video, resulting in limited interpretability and reliability. Meanwhile, existing AQA datasets also lack fine-grained annotations for action scores, especially for deduction items and sub-score annotations. In this paper, we construct the first AQA dataset containing fine-grained sub-score and deduction annotations for aerial skiing, which will be released as a new benchmark. For the technical challenges, we propose a novel AQA method, named JudgeMind, which significantly enhances performance and reliability by simulating the judgment and scoring mindset of professional referees. Our method segments the input action video into different stages and scores each stage to enhance accuracy. Then, we propose a stage-aware feature enhancement and fusion module to boost the perception of stage-specific key regions and enhance the robustness to visual changes caused by frequent camera viewpoints switching. In addition, we propose a knowledge-based grade-aware decoder to incorporate possible deduction items as prior knowledge to predict more accurate and reliable scores. Experimental results demonstrate that our method achieves state-of-the-art performance.

Paper Structure

This paper contains 26 sections, 2 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Comparison of existing AQA methods with our proposed method. Our proposed method simulates the scoring mindset of professional judges for more accurate and reliable scoring.
  • Figure 2: List of action/sub-action types and stage-specific deductions items contained in our FineSkiing dataset.
  • Figure 3: Examples from our FineSkiing dataset. Each row shows the video frames of an entire aerial skiing maneuver. The action type and overall score are annotated above the snapshot. Sub-scores of each stage (air, form, and landing) and the detailed deductions are shown below. It should be noted that deductions have detailed temporal location labeling.
  • Figure 4: Overview of our proposed method JudgeMind. Taking action video as input, it is first segmented into three different stages by a temporal segmentation model. Then, a stage-aware feature extraction module is leveraged to extract stage-specific key region features, which are further fed into the context fusion encoder along with the athlete action code prior to obtain enhanced features. Subsequently, a knowledge-based grade-aware decoder is proposed to assess action grades by interacting motion features with deduction knowledge. Finally, the action score is calculated using a Likert scoring module. (Best viewed in color.)
  • Figure 5: Feature heat maps at different stages. For the "air" and "landing" stages, the model mainly focuses on the interaction between the athlete and the course, while in the "form" stage, it focuses on the maneuvers of the athlete.
  • ...and 7 more figures