LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
Shiyi Zhang, Wenxun Dai, Sujia Wang, Xiangwei Shen, Jiwen Lu, Jie Zhou, Yansong Tang
TL;DR
LOGO tackles action quality assessment in multi-person, long-form videos, addressing limitations of prior datasets that focus on single-person, short-duration actions. The authors introduce LOGO with 200 long-form artistic-swimming videos, 8 athletes per sample, and rich annotations for actions and formations, enabling group-aware modeling. They propose GOAT, a plug-and-play group-aware attention module built from a group-aware GCN and temporal fusion, to capture inter-actor relations and long-term temporal structure. Experimental results show GOAT yields state-of-the-art performance on LOGO and generalizes to other AQA and action-segmentation tasks, highlighting the value of explicit group information in AQA.
Abstract
Action quality assessment (AQA) has become an emerging topic since it can be extensively applied in numerous scenarios. However, most existing methods and datasets focus on single-person short-sequence scenes, hindering the application of AQA in more complex situations. To address this issue, we construct a new multi-person long-form video dataset for action quality assessment named LOGO. Distinguished in scenario complexity, our dataset contains 200 videos from 26 artistic swimming events with 8 athletes in each sample along with an average duration of 204.2 seconds. As for richness in annotations, LOGO includes formation labels to depict group information of multiple athletes and detailed annotations on action procedures. Furthermore, we propose a simple yet effective method to model relations among athletes and reason about the potential temporal logic in long-form videos. Specifically, we design a group-aware attention module, which can be easily plugged into existing AQA methods, to enrich the clip-wise representations based on contextual group information. To benchmark LOGO, we systematically conduct investigations on the performance of several popular methods in AQA and action segmentation. The results reveal the challenges our dataset brings. Extensive experiments also show that our approach achieves state-of-the-art on the LOGO dataset. The dataset and code will be released at \url{https://github.com/shiyi-zh0408/LOGO }.
