Table of Contents
Fetching ...

Poem Meter Classification of Recited Arabic Poetry: Integrating High-Resource Systems for a Low-Resource Task

Maged S. Al-Shaibani, Zaid Alyafeai, Irfan Ahmad

TL;DR

The paper tackles automatic poem-meter classification for recited Arabic poetry, a low-resource task, by proposing two architectures: an end-to-end Wav2Vec2-based system and a transcription-based pipeline that leverages high-resource ASR and textual-meter classifiers, complemented by a public benchmark. End-to-end results are limited by data size, while the transcription-based approach achieves strong, near-state-of-the-art performance on two datasets, aided by a language-model-enhanced transcription step. The work introduces a public benchmark and demonstrates that integrating high-resource components can effectively address low-resource poetry prosody tasks, with clear implications for Arabic NLP and computational linguistics. Future directions include collecting a larger, diverse dataset, exploring alternative ASR backbones (e.g., Whisper), and further data augmentation and normalization to improve robustness across domains.

Abstract

Arabic poetry is an essential and integral part of Arabic language and culture. It has been used by the Arabs to spot lights on their major events such as depicting brutal battles and conflicts. They also used it, as in many other languages, for various purposes such as romance, pride, lamentation, etc. Arabic poetry has received major attention from linguistics over the decades. One of the main characteristics of Arabic poetry is its special rhythmic structure as opposed to prose. This structure is referred to as a meter. Meters, along with other poetic characteristics, are intensively studied in an Arabic linguistic field called "\textit{Aroud}". Identifying these meters for a verse is a lengthy and complicated process. It also requires technical knowledge in \textit{Aruod}. For recited poetry, it adds an extra layer of processing. Developing systems for automatic identification of poem meters for recited poems need large amounts of labelled data. In this study, we propose a state-of-the-art framework to identify the poem meters of recited Arabic poetry, where we integrate two separate high-resource systems to perform the low-resource task. To ensure generalization of our proposed architecture, we publish a benchmark for this task for future research.

Poem Meter Classification of Recited Arabic Poetry: Integrating High-Resource Systems for a Low-Resource Task

TL;DR

The paper tackles automatic poem-meter classification for recited Arabic poetry, a low-resource task, by proposing two architectures: an end-to-end Wav2Vec2-based system and a transcription-based pipeline that leverages high-resource ASR and textual-meter classifiers, complemented by a public benchmark. End-to-end results are limited by data size, while the transcription-based approach achieves strong, near-state-of-the-art performance on two datasets, aided by a language-model-enhanced transcription step. The work introduces a public benchmark and demonstrates that integrating high-resource components can effectively address low-resource poetry prosody tasks, with clear implications for Arabic NLP and computational linguistics. Future directions include collecting a larger, diverse dataset, exploring alternative ASR backbones (e.g., Whisper), and further data augmentation and normalization to improve robustness across domains.

Abstract

Arabic poetry is an essential and integral part of Arabic language and culture. It has been used by the Arabs to spot lights on their major events such as depicting brutal battles and conflicts. They also used it, as in many other languages, for various purposes such as romance, pride, lamentation, etc. Arabic poetry has received major attention from linguistics over the decades. One of the main characteristics of Arabic poetry is its special rhythmic structure as opposed to prose. This structure is referred to as a meter. Meters, along with other poetic characteristics, are intensively studied in an Arabic linguistic field called "\textit{Aroud}". Identifying these meters for a verse is a lengthy and complicated process. It also requires technical knowledge in \textit{Aruod}. For recited poetry, it adds an extra layer of processing. Developing systems for automatic identification of poem meters for recited poems need large amounts of labelled data. In this study, we propose a state-of-the-art framework to identify the poem meters of recited Arabic poetry, where we integrate two separate high-resource systems to perform the low-resource task. To ensure generalization of our proposed architecture, we publish a benchmark for this task for future research.

Paper Structure

This paper contains 19 sections, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Meter distribution for the 16 poem meters in the baseline dataset.
  • Figure 2: Meter distribution for the 16 poem meters in the benchmark set.
  • Figure 3: A high-level overview of Wav2Vec2 architecture.
  • Figure 4: An illustration of the end-to-end system for poem-meter classification.
  • Figure 5: An integrated system for transcription-based poem meter classification.
  • ...and 3 more figures