Table of Contents
Fetching ...

FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts

Caroline Brun, Vassilina Nikoulina

TL;DR

FrenchToxicityPrompts, a dataset of 50K naturally occurring French prompts and their continuations, annotated with toxicity scores from a widely used toxicity classifier, is created and released to foster future research on toxicity detection and mitigation beyond English.

Abstract

Large language models (LLMs) are increasingly popular but are also prone to generating bias, toxic or harmful language, which can have detrimental effects on individuals and communities. Although most efforts is put to assess and mitigate toxicity in generated content, it is primarily concentrated on English, while it's essential to consider other languages as well. For addressing this issue, we create and release FrenchToxicityPrompts, a dataset of 50K naturally occurring French prompts and their continuations, annotated with toxicity scores from a widely used toxicity classifier. We evaluate 14 different models from four prevalent open-sourced families of LLMs against our dataset to assess their potential toxicity across various dimensions. We hope that our contribution will foster future research on toxicity detection and mitigation beyond Englis

FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts

TL;DR

FrenchToxicityPrompts, a dataset of 50K naturally occurring French prompts and their continuations, annotated with toxicity scores from a widely used toxicity classifier, is created and released to foster future research on toxicity detection and mitigation beyond English.

Abstract

Large language models (LLMs) are increasingly popular but are also prone to generating bias, toxic or harmful language, which can have detrimental effects on individuals and communities. Although most efforts is put to assess and mitigate toxicity in generated content, it is primarily concentrated on English, while it's essential to consider other languages as well. For addressing this issue, we create and release FrenchToxicityPrompts, a dataset of 50K naturally occurring French prompts and their continuations, annotated with toxicity scores from a widely used toxicity classifier. We evaluate 14 different models from four prevalent open-sourced families of LLMs against our dataset to assess their potential toxicity across various dimensions. We hope that our contribution will foster future research on toxicity detection and mitigation beyond Englis

Paper Structure

This paper contains 10 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 2: Toxicity results across various models. Top: Toxicity metrics for the continuations of toxic prompts; bottom: toxicity metrics for the continuations of non-toxic prompts. x-Axis: model size, y-axis: value of toxicity metrics.
  • Figure 3: Percentages of languages generated by the different models. A language is displayed if at least one model among the 14 tested generate more than 1% of it, unkn corresponds to cases where the language detector cannot take a decision, and other corresponds to the sum of all other detected languages, i.e languages that reach less than 1% each for all models.