Table of Contents
Fetching ...

A Study on Bias Detection and Classification in Natural Language Processing

Ana Sofia Evans, Helena Moniz, Luísa Coheur

TL;DR

This paper investigates how publicly available bias and hate-speech datasets can be assembled and used to train classifiers for bias detection and classification in NLP. It systematically gathers datasets across binary, single-target, and multi-target tasks, standardizes labels via a unified mapping, and evaluates a DistilBERT-based Emotion-Transformer across four data groups. Key findings show that while single-target data can yield targeted gains with little overall loss, synthetic and multi-target data often degrade per-class performance, and combining all resources does not outperform simpler baselines due to dataset imbalance and non-persistent data issues. The study highlights issues of dataset degradation, annotation biases, and language/documentation diversity, arguing for more persistent, diverse, and clearly annotated resources to enable robust bias detection in NLP.

Abstract

Human biases have been shown to influence the performance of models and algorithms in various fields, including Natural Language Processing. While the study of this phenomenon is garnering focus in recent years, the available resources are still relatively scarce, often focusing on different forms or manifestations of biases. The aim of our work is twofold: 1) gather publicly-available datasets and determine how to better combine them to effectively train models in the task of hate speech detection and classification; 2) analyse the main issues with these datasets, such as scarcity, skewed resources, and reliance on non-persistent data. We discuss these issues in tandem with the development of our experiments, in which we show that the combinations of different datasets greatly impact the models' performance.

A Study on Bias Detection and Classification in Natural Language Processing

TL;DR

This paper investigates how publicly available bias and hate-speech datasets can be assembled and used to train classifiers for bias detection and classification in NLP. It systematically gathers datasets across binary, single-target, and multi-target tasks, standardizes labels via a unified mapping, and evaluates a DistilBERT-based Emotion-Transformer across four data groups. Key findings show that while single-target data can yield targeted gains with little overall loss, synthetic and multi-target data often degrade per-class performance, and combining all resources does not outperform simpler baselines due to dataset imbalance and non-persistent data issues. The study highlights issues of dataset degradation, annotation biases, and language/documentation diversity, arguing for more persistent, diverse, and clearly annotated resources to enable robust bias detection in NLP.

Abstract

Human biases have been shown to influence the performance of models and algorithms in various fields, including Natural Language Processing. While the study of this phenomenon is garnering focus in recent years, the available resources are still relatively scarce, often focusing on different forms or manifestations of biases. The aim of our work is twofold: 1) gather publicly-available datasets and determine how to better combine them to effectively train models in the task of hate speech detection and classification; 2) analyse the main issues with these datasets, such as scarcity, skewed resources, and reliance on non-persistent data. We discuss these issues in tandem with the development of our experiments, in which we show that the combinations of different datasets greatly impact the models' performance.
Paper Structure (22 sections, 4 figures, 15 tables)

This paper contains 22 sections, 4 figures, 15 tables.

Figures (4)

  • Figure 1: Average F-scores of Multi-C, Multi-D, NoAge-C, and NoAge-D
  • Figure 2: Class breakdown of the F1-scores obtained across Multi-C experiments
  • Figure 3: Comparison between F1-score averages of Multi-C and Multi-D
  • Figure 4: F1-scores of experiments Multi-D, Inter-D, as well as the average F1-scores of Binary-D