Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning
Aditya Narayan Sankaran, Reza Farahbakhsh, Noel Crespi
TL;DR
The paper tackles the challenge of detecting abusive audio content in low-resource, multilingual settings, focusing on Indian languages. It employs Model-Agnostic Meta-Learning (MAML) to enable few-shot cross-lingual classification using strong pre-trained audio representations from Whisper and Wav2Vec CLSRIL-23, evaluated on the ADIMA dataset across 10 languages. A comparative study of two feature normalizations (Temporal Mean and L2-Norm) and a visual analysis via t-SNE demonstrate that Whisper with L2-Norm offers robust cross-lingual transfer, with Malayalam reaching 85.22% accuracy in the 100-shot setting. The work provides practical insights for multilingual audio moderation in data-scarce environments and points to future enhancements through additional meta-learning approaches and expanded language coverage.
Abstract
Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explore cross-lingual abuse detection using the ADIMA dataset with FSL. Our approach integrates these representations within the Model-Agnostic Meta-Learning (MAML) framework to classify abusive language in 10 languages. We experiment with various shot sizes (50-200) evaluating the impact of limited data on performance. Additionally, a feature visualization study was conducted to better understand model behaviour. This study highlights the generalization ability of pre-trained models in low-resource scenarios and offers valuable insights into detecting abusive language in multilingual contexts.
