Table of Contents
Fetching ...

Abusive Speech Detection in Indic Languages Using Acoustic Features

Anika A. Spiesberger, Andreas Triantafyllopoulos, Iosif Tsangko, Björn W. Schuller

TL;DR

This paper addresses abusive speech detection in Indic languages by focusing on acoustic and prosodic (paralinguistic) features, arguing that tone and emotion complement textual cues. Using the ADIMA dataset across ten languages, the authors compare feature sets (eGeMAPS and ComParE_2016) and classifiers (LR, XGBoost, SVM, RF) in multilingual and cross-lingual setups, employing SHAP and MWU for feature importance and significance. RF with the eGeMAPS feature set yields robust cross-lingual performance (UAR up to ~0.84), with SHAP and MWU analyses highlighting loudness, mean F1–F3 amplitudes, spectral flux, and voiced-segment metrics as key discriminators. The results demonstrate that paralinguistic features alone can detect abusive content in real-life audio data, offering a language-agnostic approach, though limitations such as missing age/sex metadata and the need to capture sarcasm/irony remain for future work.

Abstract

Abusive content in online social networks is a well-known problem that can cause serious psychological harm and incite hatred. The ability to upload audio data increases the importance of developing methods to detect abusive content in speech recordings. However, simply transferring the mechanisms from written abuse detection would ignore relevant information such as emotion and tone. In addition, many current algorithms require training in the specific language for which they are being used. This paper proposes to use acoustic and prosodic features to classify abusive content. We used the ADIMA data set, which contains recordings from ten Indic languages, and trained different models in multilingual and cross-lingual settings. Our results show that it is possible to classify abusive and non-abusive content using only acoustic and prosodic features. The most important and influential features are discussed.

Abusive Speech Detection in Indic Languages Using Acoustic Features

TL;DR

This paper addresses abusive speech detection in Indic languages by focusing on acoustic and prosodic (paralinguistic) features, arguing that tone and emotion complement textual cues. Using the ADIMA dataset across ten languages, the authors compare feature sets (eGeMAPS and ComParE_2016) and classifiers (LR, XGBoost, SVM, RF) in multilingual and cross-lingual setups, employing SHAP and MWU for feature importance and significance. RF with the eGeMAPS feature set yields robust cross-lingual performance (UAR up to ~0.84), with SHAP and MWU analyses highlighting loudness, mean F1–F3 amplitudes, spectral flux, and voiced-segment metrics as key discriminators. The results demonstrate that paralinguistic features alone can detect abusive content in real-life audio data, offering a language-agnostic approach, though limitations such as missing age/sex metadata and the need to capture sarcasm/irony remain for future work.

Abstract

Abusive content in online social networks is a well-known problem that can cause serious psychological harm and incite hatred. The ability to upload audio data increases the importance of developing methods to detect abusive content in speech recordings. However, simply transferring the mechanisms from written abuse detection would ignore relevant information such as emotion and tone. In addition, many current algorithms require training in the specific language for which they are being used. This paper proposes to use acoustic and prosodic features to classify abusive content. We used the ADIMA data set, which contains recordings from ten Indic languages, and trained different models in multilingual and cross-lingual settings. Our results show that it is possible to classify abusive and non-abusive content using only acoustic and prosodic features. The most important and influential features are discussed.
Paper Structure (7 sections, 2 figures, 2 tables)

This paper contains 7 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Heatmap of UAR scores for the RF classifier trained on each of the ten languages in the ADIMA data set using the eGeMAPS features. The rows indicate the training language, while the columns indicate the test language. The top row shows the performance when training on all languages except the test language; the rightmost column shows the performance when testing on all languages except the training language.
  • Figure 2: SHAP graph for the RF classifier trained on all ten languages in the ADIMA data set using the eGeMAPS features. The values indicate the influence of the features on the model. Note: LU = Loudness