Table of Contents
Fetching ...

FormCoach: Lift Smarter, Not Harder

Xiaoye Zuo, Nikos Athanasiou, Ginger Delmas, Yiming Huang, Xingyu Fu, Lingjie Liu

TL;DR

FormCoach tackles the lack of personalized coaching for at-home workouts by leveraging vision-language models to compare real-time user movements against expert references and deliver concise, actionable feedback. The work introduces a curated evaluation dataset of 1,700 user–reference video pairs across 22 exercises and a rubric-based pipeline for standardized VLM benchmarking, enabling reproducible comparisons to human coaching. Benchmark results show that state-of-the-art VLMs, while offering strong actionability, still fall short in accurately identifying subtle form discrepancies and can produce hallucinations, indicating substantial room for improvement. The study highlights future directions, including multi-modal sensing with 3D pose estimation and wearables, two-way conversational coaching, and AR-enabled embodiment, to move toward trustworthy, embodied AI coaching for a broad at-home audience.

Abstract

Good form is the difference between strength and strain, yet for the fast-growing community of at-home fitness enthusiasts, expert feedback is often out of reach. FormCoach transforms a simple camera into an always-on, interactive AI training partner, capable of spotting subtle form errors and delivering tailored corrections in real time, leveraging vision-language models (VLMs). We showcase this capability through a web interface and benchmark state-of-the-art VLMs on a dataset of 1,700 expert-annotated user-reference video pairs spanning 22 strength and mobility exercises. To accelerate research in AI-driven coaching, we release both the dataset and an automated, rubric-based evaluation pipeline, enabling standardized comparison across models. Our benchmarks reveal substantial gaps compared to human-level coaching, underscoring both the challenges and opportunities in integrating nuanced, context-aware movement analysis into interactive AI systems. By framing form correction as a collaborative and creative process between humans and machines, FormCoach opens a new frontier in embodied AI.

FormCoach: Lift Smarter, Not Harder

TL;DR

FormCoach tackles the lack of personalized coaching for at-home workouts by leveraging vision-language models to compare real-time user movements against expert references and deliver concise, actionable feedback. The work introduces a curated evaluation dataset of 1,700 user–reference video pairs across 22 exercises and a rubric-based pipeline for standardized VLM benchmarking, enabling reproducible comparisons to human coaching. Benchmark results show that state-of-the-art VLMs, while offering strong actionability, still fall short in accurately identifying subtle form discrepancies and can produce hallucinations, indicating substantial room for improvement. The study highlights future directions, including multi-modal sensing with 3D pose estimation and wearables, two-way conversational coaching, and AR-enabled embodiment, to move toward trustworthy, embodied AI coaching for a broad at-home audience.

Abstract

Good form is the difference between strength and strain, yet for the fast-growing community of at-home fitness enthusiasts, expert feedback is often out of reach. FormCoach transforms a simple camera into an always-on, interactive AI training partner, capable of spotting subtle form errors and delivering tailored corrections in real time, leveraging vision-language models (VLMs). We showcase this capability through a web interface and benchmark state-of-the-art VLMs on a dataset of 1,700 expert-annotated user-reference video pairs spanning 22 strength and mobility exercises. To accelerate research in AI-driven coaching, we release both the dataset and an automated, rubric-based evaluation pipeline, enabling standardized comparison across models. Our benchmarks reveal substantial gaps compared to human-level coaching, underscoring both the challenges and opportunities in integrating nuanced, context-aware movement analysis into interactive AI systems. By framing form correction as a collaborative and creative process between humans and machines, FormCoach opens a new frontier in embodied AI.

Paper Structure

This paper contains 15 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: FormCoach pipeline
  • Figure 2: Prototype web interface
  • Figure 3: Example from FormCoach evaluation dataset showing the reference (top) and user (bottom) performing the same squatting exercise. Below, we present the expert feedback and sample VLM-generated feedbacks. We highlight the correct feedback in green, hallucinated feedback in yellow, and incorrect/unclear feedback in red.
  • Figure 4: Accuracy of VLM-generated Feedback Across Exercises: Top 3 vs. Bottom 3
  • Figure 5: Instruction Annotation Interface on Gradio
  • ...and 4 more figures