Poem Meter Classification of Recited Arabic Poetry: Integrating High-Resource Systems for a Low-Resource Task
Maged S. Al-Shaibani, Zaid Alyafeai, Irfan Ahmad
TL;DR
The paper tackles automatic poem-meter classification for recited Arabic poetry, a low-resource task, by proposing two architectures: an end-to-end Wav2Vec2-based system and a transcription-based pipeline that leverages high-resource ASR and textual-meter classifiers, complemented by a public benchmark. End-to-end results are limited by data size, while the transcription-based approach achieves strong, near-state-of-the-art performance on two datasets, aided by a language-model-enhanced transcription step. The work introduces a public benchmark and demonstrates that integrating high-resource components can effectively address low-resource poetry prosody tasks, with clear implications for Arabic NLP and computational linguistics. Future directions include collecting a larger, diverse dataset, exploring alternative ASR backbones (e.g., Whisper), and further data augmentation and normalization to improve robustness across domains.
Abstract
Arabic poetry is an essential and integral part of Arabic language and culture. It has been used by the Arabs to spot lights on their major events such as depicting brutal battles and conflicts. They also used it, as in many other languages, for various purposes such as romance, pride, lamentation, etc. Arabic poetry has received major attention from linguistics over the decades. One of the main characteristics of Arabic poetry is its special rhythmic structure as opposed to prose. This structure is referred to as a meter. Meters, along with other poetic characteristics, are intensively studied in an Arabic linguistic field called "\textit{Aroud}". Identifying these meters for a verse is a lengthy and complicated process. It also requires technical knowledge in \textit{Aruod}. For recited poetry, it adds an extra layer of processing. Developing systems for automatic identification of poem meters for recited poems need large amounts of labelled data. In this study, we propose a state-of-the-art framework to identify the poem meters of recited Arabic poetry, where we integrate two separate high-resource systems to perform the low-resource task. To ensure generalization of our proposed architecture, we publish a benchmark for this task for future research.
