Table of Contents
Fetching ...

The Art of Audience Engagement: LLM-Based Thin-Slicing of Scientific Talks

Ralf Schmälzle, Sue Lim, Yuetong Du, Gary Bente

TL;DR

It is shown that brief excerpts (thin slices) of transcribed texts from real presentations reliably predict overall quality evaluations, suggesting that the first moments of a presentation convey relevant information that is used in quality evaluations and can shape lasting impressions.

Abstract

This paper examines the thin-slicing approach - the ability to make accurate judgments based on minimal information - in the context of scientific presentations. Drawing on research from nonverbal communication and personality psychology, we show that brief excerpts (thin slices) reliably predict overall presentation quality. Using a novel corpus of over one hundred real-life science talks, we employ Large Language Models (LLMs) to evaluate transcripts of full presentations and their thin slices. By correlating LLM-based evaluations of short excerpts with full-talk assessments, we determine how much information is needed for accurate predictions. Our results demonstrate that LLM-based evaluations align closely with human ratings, proving their validity, reliability, and efficiency. Critically, even very short excerpts (less than 10 percent of a talk) strongly predict overall evaluations. This suggests that the first moments of a presentation convey relevant information that is used in quality evaluations and can shape lasting impressions. The findings are robust across different LLMs and prompting strategies. This work extends thin-slicing research to public speaking and connects theories of impression formation to LLMs and current research on AI communication. We discuss implications for communication and social cognition research on message reception. Lastly, we suggest an LLM-based thin-slicing framework as a scalable feedback tool to enhance human communication.

The Art of Audience Engagement: LLM-Based Thin-Slicing of Scientific Talks

TL;DR

It is shown that brief excerpts (thin slices) of transcribed texts from real presentations reliably predict overall quality evaluations, suggesting that the first moments of a presentation convey relevant information that is used in quality evaluations and can shape lasting impressions.

Abstract

This paper examines the thin-slicing approach - the ability to make accurate judgments based on minimal information - in the context of scientific presentations. Drawing on research from nonverbal communication and personality psychology, we show that brief excerpts (thin slices) reliably predict overall presentation quality. Using a novel corpus of over one hundred real-life science talks, we employ Large Language Models (LLMs) to evaluate transcripts of full presentations and their thin slices. By correlating LLM-based evaluations of short excerpts with full-talk assessments, we determine how much information is needed for accurate predictions. Our results demonstrate that LLM-based evaluations align closely with human ratings, proving their validity, reliability, and efficiency. Critically, even very short excerpts (less than 10 percent of a talk) strongly predict overall evaluations. This suggests that the first moments of a presentation convey relevant information that is used in quality evaluations and can shape lasting impressions. The findings are robust across different LLMs and prompting strategies. This work extends thin-slicing research to public speaking and connects theories of impression formation to LLMs and current research on AI communication. We discuss implications for communication and social cognition research on message reception. Lastly, we suggest an LLM-based thin-slicing framework as a scalable feedback tool to enhance human communication.

Paper Structure

This paper contains 22 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Logic of Thin-Slices Evaluation of Public Speech Performance in the Context of Science Communication. Recordings of real-life talks about science topics are transcribed to text. Next, each transcript is thin-sliced into excerpts containing either the full speech text, or slices corresponding to 1%, 5%, 10%, etc. These slices are then submitted to LLM for quality assessment, leading to a table with ratings for all speeches and across all slices. Ratings are collected independently (no memory in the LLM) and evaluated via different prompts and multiple LLMs. Finally, we compare evaluations across slices to examine how much of the speech needs to be processed until stable quality predictions can be made.
  • Figure 2: Thin-Slice to Full-Speech (part-to-whole) Correlations for both LLMs. Shaded corridors illustrate the variability across the five different prompts. Bottom panels: Left: Individual-prompt results for OpenAI’s GPT (blue) and Google’s Gemini (red) models. As can be seen, the same general pattern is present regardless of model family or prompt wording. Right: Scatter plots for all 128 speeches. As slice thickness increases, the predictions become progressively more aligned with the evaluation for the entire speech.
  • Figure : Figure S1. Convergence between Human and LLM Ratings. Top Row. Group-averaged scores converge (around r = 0.7) for both 20% slices and well as the slice-to-full-speech correlation. Bottom Row: Correlations between Human-to-human raters as well as the different LLM models and prompts (10 last rows) reveal strong positive correlations among all rating sources. LLM-ratings are slightly more consistent/less variable.