Table of Contents
Fetching ...

Creating an Aligned Corpus of Sound and Text: The Multimodal Corpus of Shakespeare and Milton

Manex Agirrezabal

TL;DR

The paper introduces a multimodal corpus aligning Shakespeare and Milton poems with public-domain audio at line, word, syllable, and phone levels, augmented by automated scansion. It describes a processing pipeline combining DTW-based line alignment, G2P, syllabification, forced phoneme alignment, and BiLSTM-CRF scansion, with data encoded in TEI 5.0 and made accessible through an interactive visualization platform. Descriptive statistics and correlation analyses quantify timing relations and model performance (syllabification and scansion), demonstrating meaningful, though imperfect, alignment and rhythmic extraction. The work provides a resource for studying text–audio correspondences in poetry, with potential to extend to additional poets, meters, and multi-modal cues, thereby bridging linguistics, literary analysis, and acoustics.

Abstract

In this work we present a corpus of poems by William Shakespeare and John Milton that have been enriched with readings from the public domain. We have aligned all the lines with their respective audio segments, at the line, word, syllable and phone level, and we have included their scansion. We make a basic visualization platform for these poems and we conclude by conjecturing possible future directions.

Creating an Aligned Corpus of Sound and Text: The Multimodal Corpus of Shakespeare and Milton

TL;DR

The paper introduces a multimodal corpus aligning Shakespeare and Milton poems with public-domain audio at line, word, syllable, and phone levels, augmented by automated scansion. It describes a processing pipeline combining DTW-based line alignment, G2P, syllabification, forced phoneme alignment, and BiLSTM-CRF scansion, with data encoded in TEI 5.0 and made accessible through an interactive visualization platform. Descriptive statistics and correlation analyses quantify timing relations and model performance (syllabification and scansion), demonstrating meaningful, though imperfect, alignment and rhythmic extraction. The work provides a resource for studying text–audio correspondences in poetry, with potential to extend to additional poets, meters, and multi-modal cues, thereby bridging linguistics, literary analysis, and acoustics.

Abstract

In this work we present a corpus of poems by William Shakespeare and John Milton that have been enriched with readings from the public domain. We have aligned all the lines with their respective audio segments, at the line, word, syllable and phone level, and we have included their scansion. We make a basic visualization platform for these poems and we conclude by conjecturing possible future directions.
Paper Structure (9 sections, 7 figures, 2 tables)

This paper contains 9 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Excerpts from the 9th book of Paradise Lost by John Milton and its scansion (automatic)
  • Figure 2: Excerpt of the dataset, from the second Sonnet by William Shakespeare showing only the line level alignments.
  • Figure 3: Caption
  • Figure 4: Screenshot of the website, where the finger on the left of the poem shows which line of the poem we are currently listening
  • Figure 5: The correlation between line duration in seconds and line length in characters. This shows a Pearson correlation of $0.2932$.
  • ...and 2 more figures