Table of Contents
Fetching ...

PATS: Proficiency-Aware Temporal Sampling for Multi-View Sports Skill Assessment

Edoardo Bianchi, Antonio Liotta

TL;DR

The paper tackles the challenge of automated sports skill assessment by preserving temporal continuity through Proficiency-Aware Temporal Sampling (PATS), which extracts $N_s$ contiguous segments of duration $d_s$ totaling $N_{target}$ frames to maintain complete fundamental movements across multi-view data. Integrated with SkillFormer on the EgoExo4D benchmark, PATS achieves state-of-the-art accuracy across egocentric, exocentric, and combined views and delivers domain-specific gains (e.g., +26.22% in bouldering, +2.39% in music, +1.13% in basketball) while remaining architecture-agnostic and computationally efficient. The method introduces explicit adaptive segment positioning, robust edge-case handling, and a principled frame-allocation strategy, offering clear design principles for temporal sampling in skill assessment. Collectively, PATS advances real-world automated proficiency estimation by aligning sampling strategies with the temporal structure of skilled movements and enabling robust, cross-view analysis.

Abstract

Automated sports skill assessment requires capturing fundamental movement patterns that distinguish expert from novice performance, yet current video sampling methods disrupt the temporal continuity essential for proficiency evaluation. To this end, we introduce Proficiency-Aware Temporal Sampling (PATS), a novel sampling strategy that preserves complete fundamental movements within continuous temporal segments for multi-view skill assessment. PATS adaptively segments videos to ensure each analyzed portion contains full execution of critical performance components, repeating this process across multiple segments to maximize information coverage while maintaining temporal coherence. Evaluated on the EgoExo4D benchmark with SkillFormer, PATS surpasses the state-of-the-art accuracy across all viewing configurations (+0.65% to +3.05%) and delivers substantial gains in challenging domains (+26.22% bouldering, +2.39% music, +1.13% basketball). Systematic analysis reveals that PATS successfully adapts to diverse activity characteristics-from high-frequency sampling for dynamic sports to fine-grained segmentation for sequential skills-demonstrating its effectiveness as an adaptive approach to temporal sampling that advances automated skill assessment for real-world applications. Visit our project page at https://edowhite.github.io/PATS

PATS: Proficiency-Aware Temporal Sampling for Multi-View Sports Skill Assessment

TL;DR

The paper tackles the challenge of automated sports skill assessment by preserving temporal continuity through Proficiency-Aware Temporal Sampling (PATS), which extracts contiguous segments of duration totaling frames to maintain complete fundamental movements across multi-view data. Integrated with SkillFormer on the EgoExo4D benchmark, PATS achieves state-of-the-art accuracy across egocentric, exocentric, and combined views and delivers domain-specific gains (e.g., +26.22% in bouldering, +2.39% in music, +1.13% in basketball) while remaining architecture-agnostic and computationally efficient. The method introduces explicit adaptive segment positioning, robust edge-case handling, and a principled frame-allocation strategy, offering clear design principles for temporal sampling in skill assessment. Collectively, PATS advances real-world automated proficiency estimation by aligning sampling strategies with the temporal structure of skilled movements and enabling robust, cross-view analysis.

Abstract

Automated sports skill assessment requires capturing fundamental movement patterns that distinguish expert from novice performance, yet current video sampling methods disrupt the temporal continuity essential for proficiency evaluation. To this end, we introduce Proficiency-Aware Temporal Sampling (PATS), a novel sampling strategy that preserves complete fundamental movements within continuous temporal segments for multi-view skill assessment. PATS adaptively segments videos to ensure each analyzed portion contains full execution of critical performance components, repeating this process across multiple segments to maximize information coverage while maintaining temporal coherence. Evaluated on the EgoExo4D benchmark with SkillFormer, PATS surpasses the state-of-the-art accuracy across all viewing configurations (+0.65% to +3.05%) and delivers substantial gains in challenging domains (+26.22% bouldering, +2.39% music, +1.13% basketball). Systematic analysis reveals that PATS successfully adapts to diverse activity characteristics-from high-frequency sampling for dynamic sports to fine-grained segmentation for sequential skills-demonstrating its effectiveness as an adaptive approach to temporal sampling that advances automated skill assessment for real-world applications. Visit our project page at https://edowhite.github.io/PATS

Paper Structure

This paper contains 21 sections, 6 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: In this configuration, PATS extracts $N_{target} = 32$ frames from $N_s = 2$ continuous temporal segments of duration $d_s = 3 s$ from a 10 s video. Within each segment, $\lfloor N_{target}/N_s \rfloor = 16$ frames are sampled uniformly (red vertical lines), preserving temporal continuity within segments. Segment positioning with automatic spacing prevents overlap and ensure comprehensive temporal coverage. This configuration is used in the basketball and bouldering domains.