TechCoach: Towards Technical-Point-Aware Descriptive Action Coaching
Yuan-Ming Li, An-Lan Wang, Kun-Yu Lin, Yu-Ming Tang, Ling-An Zeng, Jian-Fang Hu, Wei-Shi Zheng
TL;DR
This work tackles the gap between score-based action quality assessment and practical coaching by introducing Descriptive Action Coaching (DescCoach). It presents EE4D-DescCoach, a dataset derived from EgoExo4D that provides hierarchical TechPoint-level and instance-level coaching commentary, enabling explicit reasoning about technical points. The proposed TechCoach framework integrates a Context-aware TechPoint Reasoner with a Unified TechPoint-aware Action Assessor to produce both a final quality score and detailed coaching feedback, guided by progressive attention and TechPoint-level alignment losses. Experiments show state-of-the-art performance on score regression and coaching commentary generation, validating the necessity of TechPoint-aware reasoning for explainable, actionable action coaching with potential real-world impact in sports and physical skills training.
Abstract
To guide a learner in mastering action skills, it is crucial for a coach to 1) reason through the learner's action execution and technical points (TechPoints), and 2) provide detailed, comprehensible feedback on what is done well and what can be improved. However, existing score-based action assessment methods are still far from reaching this practical scenario. To bridge this gap, we investigate a new task termed Descriptive Action Coaching (DescCoach) which requires the model to provide detailed commentary on what is done well and what can be improved beyond a simple quality score for action execution. To this end, we first build a new dataset named EE4D-DescCoach. Through an automatic annotation pipeline, our dataset goes beyond the existing action assessment datasets by providing detailed TechPoint-level commentary. Furthermore, we propose TechCoach, a new framework that explicitly incorporates TechPoint-level reasoning into the DescCoach process. The central to our method lies in the Context-aware TechPoint Reasoner, which enables TechCoach to learn TechPoint-related quality representation by querying visual context under the supervision of TechPoint-level coaching commentary. By leveraging the visual context and the TechPoint-related quality representation, a unified TechPoint-aware Action Assessor is then employed to provide the overall coaching commentary together with the quality score. Combining all of these, we establish a new benchmark for DescCoach and evaluate the effectiveness of our method through extensive experiments. The data and code will be made publicly available.
