Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
Atharva Mehta, Shivam Chauhan, Amirbek Djanibekov, Atharva Kulkarni, Gus Xia, Monojit Choudhury
TL;DR
The paper systematically analyzes representational bias in music generation datasets, revealing that non-Western genres constitute only a small fraction of hours and exhibit disparate model performance across genres. It introduces a cross-cultural adaptation approach using parameter-efficient fine-tuning with Bottleneck Residual Adapters on two open-source models (MusicGen and Mustango) for Hindustani Classical and Turkish Makam, employing a novel arena-style Bloom's taxonomy evaluation. Results show that PEFT can improve generation quality for underrepresented genres, but effectiveness is highly model- and genre-dependent, underscoring the non-triviality of cross-cultural transfer. The work calls for more inclusive musical datasets and cross-cultural transfer learning baselines to prevent Western-centric homogenization of AI-generated music.
Abstract
The advent of Music-Language Models has greatly enhanced the automatic music generation capability of AI systems, but they are also limited in their coverage of the musical genres and cultures of the world. We present a study of the datasets and research papers for music generation and quantify the bias and under-representation of genres. We find that only 5.7% of the total hours of existing music datasets come from non-Western genres, which naturally leads to disparate performance of the models across genres. We then investigate the efficacy of Parameter-Efficient Fine-Tuning (PEFT) techniques in mitigating this bias. Our experiments with two popular models -- MusicGen and Mustango, for two underrepresented non-Western music traditions -- Hindustani Classical and Turkish Makam music, highlight the promises as well as the non-triviality of cross-genre adaptation of music through small datasets, implying the need for more equitable baseline music-language models that are designed for cross-cultural transfer learning.
