Table of Contents
Fetching ...

Understanding Abuse: A Typology of Abusive Language Detection Subtasks

Zeerak Waseem, Thomas Davidson, Dana Warmsley, Ingmar Weber

TL;DR

The paper addresses fragmentation in abusive language detection by proposing a two-axis typology (target and explicitness) to unify subtasks like hate speech, cyberbullying, and harassment. It maps existing labels to the typology and derives annotation and modeling implications for each subtype. It provides practical guidance for annotation guidelines, feature selection, and cross-subtask learning, with emphasis on transparency. The approach aims to improve detection systems by enabling more fine-grained, task-specific analysis of abusive language.

Abstract

As the body of research on abusive language detection and analysis grows, there is a need for critical consideration of the relationships between different subtasks that have been grouped under this label. Based on work on hate speech, cyberbullying, and online abuse we propose a typology that captures central similarities and differences between subtasks and we discuss its implications for data annotation and feature construction. We emphasize the practical actions that can be taken by researchers to best approach their abusive language detection subtask of interest.

Understanding Abuse: A Typology of Abusive Language Detection Subtasks

TL;DR

The paper addresses fragmentation in abusive language detection by proposing a two-axis typology (target and explicitness) to unify subtasks like hate speech, cyberbullying, and harassment. It maps existing labels to the typology and derives annotation and modeling implications for each subtype. It provides practical guidance for annotation guidelines, feature selection, and cross-subtask learning, with emphasis on transparency. The approach aims to improve detection systems by enabling more fine-grained, task-specific analysis of abusive language.

Abstract

As the body of research on abusive language detection and analysis grows, there is a need for critical consideration of the relationships between different subtasks that have been grouped under this label. Based on work on hate speech, cyberbullying, and online abuse we propose a typology that captures central similarities and differences between subtasks and we discuss its implications for data annotation and feature construction. We emphasize the practical actions that can be taken by researchers to best approach their abusive language detection subtask of interest.

Paper Structure

This paper contains 7 sections, 1 table.