Table of Contents
Fetching ...

Guardians of Discourse: Evaluating LLMs on Multilingual Offensive Language Detection

Jianfei He, Lilin Wang, Jiaying Wang, Zhenyu Liu, Hongbin Na, Zimu Wang, Wei Wang, Qi Chen

TL;DR

This work for the first time evaluates multilingual offensive language detection of LLMs in three languages: English, Spanish, and German with three LLMs, GPT-3.5, Flan-T5, and Mistral, in both monolingual and multilingual settings.

Abstract

Identifying offensive language is essential for maintaining safety and sustainability in the social media era. Though large language models (LLMs) have demonstrated encouraging potential in social media analytics, they lack thorough evaluation when in offensive language detection, particularly in multilingual environments. We for the first time evaluate multilingual offensive language detection of LLMs in three languages: English, Spanish, and German with three LLMs, GPT-3.5, Flan-T5, and Mistral, in both monolingual and multilingual settings. We further examine the impact of different prompt languages and augmented translation data for the task in non-English contexts. Furthermore, we discuss the impact of the inherent bias in LLMs and the datasets in the mispredictions related to sensitive topics.

Guardians of Discourse: Evaluating LLMs on Multilingual Offensive Language Detection

TL;DR

This work for the first time evaluates multilingual offensive language detection of LLMs in three languages: English, Spanish, and German with three LLMs, GPT-3.5, Flan-T5, and Mistral, in both monolingual and multilingual settings.

Abstract

Identifying offensive language is essential for maintaining safety and sustainability in the social media era. Though large language models (LLMs) have demonstrated encouraging potential in social media analytics, they lack thorough evaluation when in offensive language detection, particularly in multilingual environments. We for the first time evaluate multilingual offensive language detection of LLMs in three languages: English, Spanish, and German with three LLMs, GPT-3.5, Flan-T5, and Mistral, in both monolingual and multilingual settings. We further examine the impact of different prompt languages and augmented translation data for the task in non-English contexts. Furthermore, we discuss the impact of the inherent bias in LLMs and the datasets in the mispredictions related to sensitive topics.

Paper Structure

This paper contains 19 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Offensive language detection results for an example tweet obtained from three LLMs: Flan-T5, Mistral, and GPT-3.5.
  • Figure 2: Evaluation pipeline of LLMs in multilingual offensive language detection.
  • Figure 3: Comparison between previous SOTA methods and the best results with LLMs during the entire evaluation process.
  • Figure 4: Examples of the output by GPT-3.5 whose input violates the content moderation policy of GPT-3.5.
  • Figure 5: Frequencies of the mispredicted contents for the experimented models, related to three sensitive topics: race, sexual orientation, and genders.