Table of Contents
Fetching ...

A Comparative Analysis of Poetry Reading Audio: Singing, Narrating, or Somewhere In Between?

Kahyun Choi, Minje Kim

TL;DR

This work addresses the gap in understanding the acoustic characteristics of poetry reading by placing it on a spectrum between narrative speech and singing. It proposes a scalable signal-processing pipeline that analyzes silence patterns, local pitch variability, and beat stability across three large corpora, using WhisperX preprocessing, $pYIN$ pitch estimation, and a dynamic-programming beat-tracker with the objective $C(\{t_i\\})=\sum_{i=1}^N O(t_i) + \alpha \sum_{i=2}^N F(t_i - t_{i-1}, \tau_p)$ under regularization settings $\alpha \in \{1,1000\}$. Analysis of the Poetry Foundation poetry readings, LibriSpeech narration, and Intonation singing shows that poetry reading exhibits intermediate characteristics, sharing musical traits with singing while retaining narrative-like pitch variation; beat patterns are present but not as rigid as in singing. The findings provide a quantitative bridge between speech and music domains and underscore the value of open-source tools for reproducible, large-scale poetry-audio research.

Abstract

This paper provides a computational analysis of poetry reading audio signals at a large scale to unveil the musicality within professionally-read poems. Although the acoustic characteristics of other types of spoken language have been extensively studied, most of the literature is limited to narrative speech or singing voice, discussing how different they are from each other. In this work, we develop signal processing methods, which are tailored to capture the unique acoustic characteristics of poetry reading based on their silence patterns, temporal variations of local pitch, and beat stability. Our large-scale statistical analyses on three big corpora, each of which consists of narration (LibriSpeech), singing voice (Intonation), and poetry reading (from The Poetry Foundation), discover that poetry reading does share some musical characteristics with singing voice, although it may also resemble narrative speech.

A Comparative Analysis of Poetry Reading Audio: Singing, Narrating, or Somewhere In Between?

TL;DR

This work addresses the gap in understanding the acoustic characteristics of poetry reading by placing it on a spectrum between narrative speech and singing. It proposes a scalable signal-processing pipeline that analyzes silence patterns, local pitch variability, and beat stability across three large corpora, using WhisperX preprocessing, pitch estimation, and a dynamic-programming beat-tracker with the objective under regularization settings . Analysis of the Poetry Foundation poetry readings, LibriSpeech narration, and Intonation singing shows that poetry reading exhibits intermediate characteristics, sharing musical traits with singing while retaining narrative-like pitch variation; beat patterns are present but not as rigid as in singing. The findings provide a quantitative bridge between speech and music domains and underscore the value of open-source tools for reproducible, large-scale poetry-audio research.

Abstract

This paper provides a computational analysis of poetry reading audio signals at a large scale to unveil the musicality within professionally-read poems. Although the acoustic characteristics of other types of spoken language have been extensively studied, most of the literature is limited to narrative speech or singing voice, discussing how different they are from each other. In this work, we develop signal processing methods, which are tailored to capture the unique acoustic characteristics of poetry reading based on their silence patterns, temporal variations of local pitch, and beat stability. Our large-scale statistical analyses on three big corpora, each of which consists of narration (LibriSpeech), singing voice (Intonation), and poetry reading (from The Poetry Foundation), discover that poetry reading does share some musical characteristics with singing voice, although it may also resemble narrative speech.
Paper Structure (11 sections, 2 equations, 4 figures, 1 table)

This paper contains 11 sections, 2 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Histograms of the (a) short (b) medium (c) long silent segments.
  • Figure 2: Pitch contours (top) and local standard deviation (bottom).
  • Figure 3: Histograms of the std values of the local pitch contours.
  • Figure 4: Histograms of the beat tracking scores.