Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand Safety

Elizaveta Korotkova; Isaac Chung

Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand Safety

Elizaveta Korotkova, Isaac Chung

TL;DR

The need for building brand safety specific datasets is demonstrated via the application of common toxicity detection datasets to a subset of brand safety and the effects of weighted sampling strategies in text classification are empirically analyzed.

Abstract

The rapid growth in user generated content on social media has resulted in a significant rise in demand for automated content moderation. Various methods and frameworks have been proposed for the tasks of hate speech detection and toxic comment classification. In this work, we combine common datasets to extend these tasks to brand safety. Brand safety aims to protect commercial branding by identifying contexts where advertisements should not appear and covers not only toxicity, but also other potentially harmful content. As these datasets contain different label sets, we approach the overall problem as a binary classification task. We demonstrate the need for building brand safety specific datasets via the application of common toxicity detection datasets to a subset of brand safety and empirically analyze the effects of weighted sampling strategies in text classification.

Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand Safety

TL;DR

Abstract

Paper Structure (18 sections, 3 figures, 5 tables)

This paper contains 18 sections, 3 figures, 5 tables.

Introduction
Related Work
Toxicity Detection
Weighted Sampling
Data
Common Toxicity Detection Datasets
Private Dataset
Methodology
Datasets
Preprocessing
Models
Weighted Sampling
Evaluation
Results
Discussion
...and 3 more sections

Figures (3)

Figure 1: PCA projection of embeddings extracted from the 4th Transformer layer of the pre-trained DistilBERT model (200 examples sampled from each dataset). Each example's embedding is the average of its token embeddings.
Figure 2: Counts of brand safety classes in the Private dataset. The dataset is multi-labeled, i.e. one example may belong to more than one class. Additionally, there are 29,596 negative examples in the dataset (not shown here).
Figure 3: Heatmap of F1 scores $\times 100$ per class breakdown over all models on the Brand Safety test set. Obscenity and Hate Speech correspond to the toxicity detection domain better than the rest of the classes.

Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand Safety

TL;DR

Abstract

Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand Safety

Authors

TL;DR

Abstract

Table of Contents

Figures (3)