Abusive Speech Detection in Indic Languages Using Acoustic Features
Anika A. Spiesberger, Andreas Triantafyllopoulos, Iosif Tsangko, Björn W. Schuller
TL;DR
This paper addresses abusive speech detection in Indic languages by focusing on acoustic and prosodic (paralinguistic) features, arguing that tone and emotion complement textual cues. Using the ADIMA dataset across ten languages, the authors compare feature sets (eGeMAPS and ComParE_2016) and classifiers (LR, XGBoost, SVM, RF) in multilingual and cross-lingual setups, employing SHAP and MWU for feature importance and significance. RF with the eGeMAPS feature set yields robust cross-lingual performance (UAR up to ~0.84), with SHAP and MWU analyses highlighting loudness, mean F1–F3 amplitudes, spectral flux, and voiced-segment metrics as key discriminators. The results demonstrate that paralinguistic features alone can detect abusive content in real-life audio data, offering a language-agnostic approach, though limitations such as missing age/sex metadata and the need to capture sarcasm/irony remain for future work.
Abstract
Abusive content in online social networks is a well-known problem that can cause serious psychological harm and incite hatred. The ability to upload audio data increases the importance of developing methods to detect abusive content in speech recordings. However, simply transferring the mechanisms from written abuse detection would ignore relevant information such as emotion and tone. In addition, many current algorithms require training in the specific language for which they are being used. This paper proposes to use acoustic and prosodic features to classify abusive content. We used the ADIMA data set, which contains recordings from ten Indic languages, and trained different models in multilingual and cross-lingual settings. Our results show that it is possible to classify abusive and non-abusive content using only acoustic and prosodic features. The most important and influential features are discussed.
