Computational lexical analysis of Flamenco genres
Pablo Rosillo-Rodes, Maxi San Miguel, David Sanchez
TL;DR
This study addresses the lack of quantitative analysis of Flamenco lyrics by applying NLP and machine learning to a large lyric corpus and demonstrating that eight main palos can be classified using only lexical content. It uses TF-IDF features with a Multinomial Naive Bayes classifier to identify characteristic words and to quantify inter-palo distances via cosine similarity, further visualized through a minimum spanning tree to reveal lexical clusters. The results yield high palo-discrimination accuracy, uncover essential lexical fields, and produce a lexical-distance network that aligns with established historical kinships among palos. The work provides a quantitative framework for analyzing intangible cultural heritage lyrics, offering new insights into the origin and development of Flamenco styles and guiding future data collection and methodological refinements.
Abstract
Flamenco, recognized by UNESCO as part of the Intangible Cultural Heritage of Humanity, is a profound expression of cultural identity rooted in Andalusia, Spain. However, there is a lack of quantitative studies that help identify characteristic patterns in this long-lived music tradition. In this work, we present a computational analysis of Flamenco lyrics, employing natural language processing and machine learning to categorize over 2000 lyrics into their respective Flamenco genres, termed as $\textit{palos}$. Using a Multinomial Naive Bayes classifier, we find that lexical variation across styles enables to accurately identify distinct $\textit{palos}$. More importantly, from an automatic method of word usage, we obtain the semantic fields that characterize each style. Further, applying a metric that quantifies the inter-genre distance we perform a network analysis that sheds light on the relationship between Flamenco styles. Remarkably, our results suggest historical connections and $\textit{palo}$ evolutions. Overall, our work illuminates the intricate relationships and cultural significance embedded within Flamenco lyrics, complementing previous qualitative discussions with quantitative analyses and sparking new discussions on the origin and development of traditional music genres.
