SportSkills: Physical Skill Learning from Sports Instructional Videos

Kumar Ashutosh; Chi Hsuan Wu; Kristen Grauman

SportSkills: Physical Skill Learning from Sports Instructional Videos

Kumar Ashutosh, Chi Hsuan Wu, Kristen Grauman

Abstract

Current large-scale video datasets focus on general human activity, but lack depth of coverage on fine-grained activities needed to address physical skill learning. We introduce SportSkills, the first large-scale sports dataset geared towards physical skill learning with in-the-wild video. SportSkills has more than 360k instructional videos containing more than 630k visual demonstrations paired with instructional narrations explaining the know-how behind the actions from 55 varied sports. Through a suite of experiments, we show that SportSkills unlocks the ability to understand fine-grained differences between physical actions. Our representation achieves gains of up to 4x with the same model trained on traditional activity-centric datasets. Crucially, building on SportSkills, we introduce the first large-scale task formulation of mistake-conditioned instructional video retrieval, bridging representation learning and actionable feedback generation (e.g., "here's my execution of a skill; which video clip should I watch to improve it?"). Formal evaluations by professional coaches show our retrieval approach significantly advances the ability of video models to personalize visual instructions for a user query.

SportSkills: Physical Skill Learning from Sports Instructional Videos

Abstract

SportSkills: Physical Skill Learning from Sports Instructional Videos

Abstract

Paper Structure

Table of Contents

Figures (10)