Table of Contents
Fetching ...

GestureCoach: Rehearsing for Engaging Talks with LLM-Driven Gesture Recommendations

Ashwin Ram, Varsha Suresh, Artin Saberpour Abadian, Vera Demberg, Jürgen Steimle

TL;DR

GestureCoach introduces an LLM-driven gesture recommendation system paired with a proactive rehearsal interface to improve semantic gesturing during talks. The system learns when to gesture by fine-tuning a language model on expert TED data and chooses what gesture to perform via a Retrieval-Augmented Generation framework from a curated gesture database. Evaluation shows the emphasis proposal outperforms baselines in identifying gesture regions, while user studies reveal increased gesture diversity and higher engagement when rehearsing with GestureCoach. The work provides design implications for personalized, hybrid human-AI rehearsal tools and suggests avenues for expanding gesture modeling and generalizability beyond TED-style talks. Overall, GestureCoach demonstrates the practical potential of AI-assisted, fine-grained gestural coaching to enhance public speaking performance.

Abstract

This paper introduces GestureCoach, a system designed to help speakers deliver more engaging talks by guiding them to gesture effectively during rehearsal. GestureCoach combines an LLM-driven gesture recommendation model with a rehearsal interface that proactively cues speakers to gesture appropriately. Trained on experts' gesturing patterns from TED talks, the model consists of two modules: an emphasis proposal module, which predicts when to gesture by identifying gesture-worthy text segments in the presenter notes, and a gesture identification module, which determines what gesture to use by retrieving semantically appropriate gestures from a curated gesture database. Results of a model performance evaluation and user study (N=30) show that the emphasis proposal module outperforms off-the-shelf LLMs in identifying suitable gesture regions, and that participants rated the majority of these predicted regions and their corresponding gestures as highly appropriate. A subsequent user study (N=10) showed that rehearsing with GestureCoach encouraged speakers to gesture and significantly increased gesture diversity, resulting in more engaging talks. We conclude with design implications for future AI-driven rehearsal systems.

GestureCoach: Rehearsing for Engaging Talks with LLM-Driven Gesture Recommendations

TL;DR

GestureCoach introduces an LLM-driven gesture recommendation system paired with a proactive rehearsal interface to improve semantic gesturing during talks. The system learns when to gesture by fine-tuning a language model on expert TED data and chooses what gesture to perform via a Retrieval-Augmented Generation framework from a curated gesture database. Evaluation shows the emphasis proposal outperforms baselines in identifying gesture regions, while user studies reveal increased gesture diversity and higher engagement when rehearsing with GestureCoach. The work provides design implications for personalized, hybrid human-AI rehearsal tools and suggests avenues for expanding gesture modeling and generalizability beyond TED-style talks. Overall, GestureCoach demonstrates the practical potential of AI-assisted, fine-grained gestural coaching to enhance public speaking performance.

Abstract

This paper introduces GestureCoach, a system designed to help speakers deliver more engaging talks by guiding them to gesture effectively during rehearsal. GestureCoach combines an LLM-driven gesture recommendation model with a rehearsal interface that proactively cues speakers to gesture appropriately. Trained on experts' gesturing patterns from TED talks, the model consists of two modules: an emphasis proposal module, which predicts when to gesture by identifying gesture-worthy text segments in the presenter notes, and a gesture identification module, which determines what gesture to use by retrieving semantically appropriate gestures from a curated gesture database. Results of a model performance evaluation and user study (N=30) show that the emphasis proposal module outperforms off-the-shelf LLMs in identifying suitable gesture regions, and that participants rated the majority of these predicted regions and their corresponding gestures as highly appropriate. A subsequent user study (N=10) showed that rehearsing with GestureCoach encouraged speakers to gesture and significantly increased gesture diversity, resulting in more engaging talks. We conclude with design implications for future AI-driven rehearsal systems.

Paper Structure

This paper contains 54 sections, 1 equation, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overview of the GestureCoach system. Users' presenter notes are processed by a gesture recommendation model in the backend. A frontend rehearsal interface displays the recommended gesture regions and gestures. During rehearsal, users' speech is tracked and gesture cues are proactively delivered to provide real-time gesture guidance.
  • Figure 2: Gesture recommendation model architecture. It consists of two LLM-based modules: 1) The emphasis proposal module is fine-tuned on data from expert speakers to predict gesture regions in the presenter notes. The predicted regions are filtered and passed to 2) the gesture identification module which selects the most suitable semantic gesture from the gesture database using Retrieval-Augmented Generation.
  • Figure 3: Early Wizard-of-Oz prototype that simulates a real-time gesture cueing interface for rehearsing talks.
  • Figure 4: Distribution of users' ratings for the appropriateness of false positives and the suitability of gestures selected by the model. Over half of false positives were rated as valid (avg. ratings >5), and 40% as neutral (avg. ratings 3.5–5). None were clearly inappropriate (avg. ratings < 3), showing the model's recommendations are sensible in general.
  • Figure 5: Comparison of gesture usage and user preference after rehearsing with GestureCoach vs. Notes. GestureCoach led to significantly more unique semantic gesture use and was preferred by 8 of 10 participants, who reported increased confidence after practice and a more engaging final talk.
  • ...and 1 more figures