Directions in Abusive Language Training Data: Garbage In, Garbage Out

Bertie Vidgen; Leon Derczynski

Directions in Abusive Language Training Data: Garbage In, Garbage Out

Bertie Vidgen, Leon Derczynski

TL;DR

This paper systematically reviews 63 publicly available training datasets which have been created to train abusive language classifiers and reports on creation of a dedicated website for cataloguing abusive language data hatespeechdata.com.

Abstract

Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies. This paper systematically reviews abusive language dataset creation and content in conjunction with an open website for cataloguing abusive language data. This collection of knowledge leads to a synthesis providing evidence-based recommendations for practitioners working with this complex and highly diverse data.

Directions in Abusive Language Training Data: Garbage In, Garbage Out

TL;DR

Abstract

Directions in Abusive Language Training Data: Garbage In, Garbage Out

TL;DR

Abstract

Paper Structure

Table of Contents