The SkatingVerse Workshop & Challenge: Methods and Results

Jian Zhao; Lei Jin; Jianshu Li; Zheng Zhu; Yinglei Teng; Jiaojiao Zhao; Sadaf Gulshad; Zheng Wang; Bo Zhao; Xiangbo Shu; Yunchao Wei; Xuecheng Nie; Xiaojie Jin; Xiaodan Liang; Shin'ichi Satoh; Yandong Guo; Cewu Lu; Junliang Xing; Jane Shen Shengmei

The SkatingVerse Workshop & Challenge: Methods and Results

Jian Zhao, Lei Jin, Jianshu Li, Zheng Zhu, Yinglei Teng, Jiaojiao Zhao, Sadaf Gulshad, Zheng Wang, Bo Zhao, Xiangbo Shu, Yunchao Wei, Xuecheng Nie, Xiaojie Jin, Xiaodan Liang, Shin'ichi Satoh, Yandong Guo, Cewu Lu, Junliang Xing, Jane Shen Shengmei

TL;DR

The paper addresses fine-grained human action understanding in continuous figure skating videos by introducing the publicly released SkatingVerse dataset, which includes a training set of 19,993 RGB sequences and a testing set of 8,586 RGB sequences with hierarchical labels across 11 sets and 28 elements. It surveys three top submissions that combine ROI-based cropping (via DINO), skeleton-informed features (ViTPose+InfoGCN), temporal modeling (Temporal Pyramid Network), and strong model ensembling (including Unmasked Teacher, UniformerV2, and VideoMAE adaptations), reporting substantial gains in Top1 Acc and Mean Acc. The study highlights the utility of ROI focusing, skeleton cues, and temporal dynamics for improving action recognition in complex, real-world skating footage, and advocates for broader participation to advance practical HAU applications. Top-1 accuracy and mean accuracy are defined as Top1 Acc = $\frac{M}{N}$ and Mean Acc = $\frac{1}{K}\sum_{i=1}^{K}\frac{M_i}{N_i}$, respectively, underscoring the evaluation framework used across submissions.

Abstract

The SkatingVerse Workshop & Challenge aims to encourage research in developing novel and accurate methods for human action understanding. The SkatingVerse dataset used for the SkatingVerse Challenge has been publicly released. There are two subsets in the dataset, i.e., the training subset and testing subset. The training subsets consists of 19,993 RGB video sequences, and the testing subsets consists of 8,586 RGB video sequences. Around 10 participating teams from the globe competed in the SkatingVerse Challenge. In this paper, we provide a brief summary of the SkatingVerse Workshop & Challenge including brief introductions to the top three methods. The submission leaderboard will be reopened for researchers that are interested in the human action understanding challenge. The benchmark dataset and other information can be found at: https://skatingverse.github.io/.

The SkatingVerse Workshop & Challenge: Methods and Results

TL;DR

Abstract

The SkatingVerse Workshop & Challenge: Methods and Results

Authors

TL;DR

Abstract

Table of Contents

Figures (1)