Table of Contents
Fetching ...

Toxic Bias: Perspective API Misreads German as More Toxic

Gianluca Nogara, Francesco Pierri, Stefano Cresci, Luca Luceri, Petter Törnberg, Silvia Giordano

TL;DR

The paper investigates whether Perspective API exhibits a language bias that overestimates toxicity in German texts. Using three large datasets—multilingual Twitter data, Italian-German COVID-19 vaccine tweets, and bilingual Wikipedia summaries—the authors show German content consistently receives higher toxicity scores, with translation to English reducing or neutralizing this effect. They document abnormal toxicity score spikes, test for word-level explanations, and demonstrate that the bias intensifies at higher toxicity levels, leading to substantially greater moderation of German content at standard thresholds. The work highlights critical risks of relying on proprietary, black-box AI tools for cross-language social science research and moderation, calling for transparency, cross-language calibration, and data-sharing practices to mitigate unintended censorship and misinterpretation.

Abstract

Proprietary public APIs play a crucial and growing role as research tools among social scientists. Among such APIs, Google's machine learning-based Perspective API is extensively utilized for assessing the toxicity of social media messages, providing both an important resource for researchers and automatic content moderation. However, this paper exposes an important bias in Perspective API concerning German language text. Through an in-depth examination of several datasets, we uncover intrinsic language biases within the multilingual model of Perspective API. We find that the toxicity assessment of German content produces significantly higher toxicity levels than other languages. This finding is robust across various translations, topics, and data sources, and has significant consequences for both research and moderation strategies that rely on Perspective API. For instance, we show that, on average, four times more tweets and users would be moderated when using the German language compared to their English translation. Our findings point to broader risks associated with the widespread use of proprietary APIs within the computational social sciences.

Toxic Bias: Perspective API Misreads German as More Toxic

TL;DR

The paper investigates whether Perspective API exhibits a language bias that overestimates toxicity in German texts. Using three large datasets—multilingual Twitter data, Italian-German COVID-19 vaccine tweets, and bilingual Wikipedia summaries—the authors show German content consistently receives higher toxicity scores, with translation to English reducing or neutralizing this effect. They document abnormal toxicity score spikes, test for word-level explanations, and demonstrate that the bias intensifies at higher toxicity levels, leading to substantially greater moderation of German content at standard thresholds. The work highlights critical risks of relying on proprietary, black-box AI tools for cross-language social science research and moderation, calling for transparency, cross-language calibration, and data-sharing practices to mitigate unintended censorship and misinterpretation.

Abstract

Proprietary public APIs play a crucial and growing role as research tools among social scientists. Among such APIs, Google's machine learning-based Perspective API is extensively utilized for assessing the toxicity of social media messages, providing both an important resource for researchers and automatic content moderation. However, this paper exposes an important bias in Perspective API concerning German language text. Through an in-depth examination of several datasets, we uncover intrinsic language biases within the multilingual model of Perspective API. We find that the toxicity assessment of German content produces significantly higher toxicity levels than other languages. This finding is robust across various translations, topics, and data sources, and has significant consequences for both research and moderation strategies that rely on Perspective API. For instance, we show that, on average, four times more tweets and users would be moderated when using the German language compared to their English translation. Our findings point to broader risks associated with the widespread use of proprietary APIs within the computational social sciences.
Paper Structure (23 sections, 12 figures, 1 table)

This paper contains 23 sections, 12 figures, 1 table.

Figures (12)

  • Figure 1: Distribution of toxicity scores for tweets shared in German-speaking countries (Austria, Switzerland, and Germany) versus those in other EU countries, from Dataset 1. Histograms are built with 100 equal-width bins. Distributions are statistically different at $\alpha=0.05$ according to a Kruskal-Wallis test. The median toxicity of tweets shared in German-speaking countries is 0.075, and $\sim0.023$ when considering the overall distribution of tweets in other EU countries.
  • Figure 2: Distribution of toxicity scores for tweets shared in German-speaking countries (Austria, Switzerland, and Germany) from Dataset 1, separating texts that contain at least one non-ASCII character from those that only contain ASCII characters. Histograms are built with 100 equal-width bins. The median toxicity of the two distributions are respectively 0.08 and 0.07.
  • Figure 3: Top 10 most frequent Perspective API scores rounded at 8 decimal digits for tweets shared in German-speaking countries (Austria, Switzerland and Germany) and those in other EU countries, from Dataset 1.
  • Figure 4: Distribution of toxicity scores for COVID-19 vaccine-related tweets (top) and users (bottom) for German and Italian language, from Dataset 2. Histograms are built with 100 equal-width bins. Median values for tweet toxicity are DE = 0.132, IT = 0.026; for user toxicity DE = 0.212, IT = 0.048.
  • Figure 5: Distribution of toxicity scores for German COVID-19 vaccine-related tweets and for their English translation, from Dataset 2. Histograms are built with 100 equal-width bins. Median values are: DE = 0.132, EN = 0.012.
  • ...and 7 more figures