Table of Contents
Fetching ...

Computational Analysis of Speech Clarity Predicts Audience Engagement in TED Talks

Roni Segal, Matan Lary, Ralf Schmaelzle, Yossi Ben-Zion

Abstract

What makes a public talk resonate with large audiences? While prior research has emphasized speaker delivery or topic novelty, we reasoned that a core driver of engagement is linguistic clarity. This aligns with theories of processing fluency and cognitive load, which posit that audiences reward speakers who present complex ideas accessibly. We leveraged artificial intelligence to analyze 1,239 TED Talk transcripts (2006--2013), supplemented by a later-phase longitudinal sample. Each transcript was evaluated across 50 independent large language model runs on two dimensions, clarity of explanation and structural organization, and linked to YouTube engagement metrics (likes and views).Clarity emerged as the strongest predictor of audience responses ($β= .339$ for likes; $β= .314$ for views), contributing substantial incremental variance ($ΔR^{2} \approx .095$) beyond duration, topic, and scientific status. The full model explained 29\% of variance in likes and 22.5\% in views. This effect was domain-general, remaining invariant across content categories and between scientific and non-scientific talks. Notably, clarity outperformed traditional readability metrics, indicating that discourse coherence predicts engagement more powerfully than surface-level linguistic simplicity. Longitudinal analyses further revealed standardization within TED, characterized by increasing clarity and reduced variability over time. Theoretically, these results support processing fluency accounts: clearer communication reduces cognitive friction and elicits more positive evaluative responses. Practically, transcript-based clarity represents a scalable and trainable strategy for improving public discourse. By demonstrating that language models can reliably capture latent communicative qualities, this study paves the way for feedback systems in education, science communication, and public speaking.

Computational Analysis of Speech Clarity Predicts Audience Engagement in TED Talks

Abstract

What makes a public talk resonate with large audiences? While prior research has emphasized speaker delivery or topic novelty, we reasoned that a core driver of engagement is linguistic clarity. This aligns with theories of processing fluency and cognitive load, which posit that audiences reward speakers who present complex ideas accessibly. We leveraged artificial intelligence to analyze 1,239 TED Talk transcripts (2006--2013), supplemented by a later-phase longitudinal sample. Each transcript was evaluated across 50 independent large language model runs on two dimensions, clarity of explanation and structural organization, and linked to YouTube engagement metrics (likes and views).Clarity emerged as the strongest predictor of audience responses ( for likes; for views), contributing substantial incremental variance () beyond duration, topic, and scientific status. The full model explained 29\% of variance in likes and 22.5\% in views. This effect was domain-general, remaining invariant across content categories and between scientific and non-scientific talks. Notably, clarity outperformed traditional readability metrics, indicating that discourse coherence predicts engagement more powerfully than surface-level linguistic simplicity. Longitudinal analyses further revealed standardization within TED, characterized by increasing clarity and reduced variability over time. Theoretically, these results support processing fluency accounts: clearer communication reduces cognitive friction and elicits more positive evaluative responses. Practically, transcript-based clarity represents a scalable and trainable strategy for improving public discourse. By demonstrating that language models can reliably capture latent communicative qualities, this study paves the way for feedback systems in education, science communication, and public speaking.

Paper Structure

This paper contains 42 sections, 1 equation, 6 figures, 17 tables.

Figures (6)

  • Figure 1: Overview of the AI-based transcript evaluation pipeline for TED Talks. A large corpus of curated transcripts is evaluated using repeated large language model (LLM) assessments, which are subsequently linked to large-scale audience engagement metrics on YouTube, enabling high-resolution inference of linguistic predictors of engagement.
  • Figure 2: Histograms of the log-transformed TED Talk engagement metrics.
  • Figure 3: Distribution of AI-derived clarity scores across all TED Talks in the dataset. (N = 1,280).
  • Figure 4: Ridgeline density plots of clarity scores by year. Each curve represents the distribution of clarity values for a given year, with sample size indicated in parentheses. Over time, the distributions shift rightward and become increasingly concentrated, reflecting a steady increase in mean clarity and a concurrent decrease in variability.
  • Figure 5: Global search interest for the topic “TED” based on monthly Google Trends data (2007–2013). Values are normalized on a 0–100 scale, with 100 indicating the peak level of search activity during the period.
  • ...and 1 more figures