Table of Contents
Fetching ...

Context-Aware Content Moderation for German Newspaper Comments

Felix Krejca, Tobias Kietreiber, Alexander Buchelt, Sebastian Neumaier

TL;DR

This work tackles automatic content moderation for German newspaper comments by integrating contextual information such as article titles, topic paths, and user history. It systematically compares traditional shallow baselines, deep learning LSTM/CNN models, and GPT-3.5-Turbo prompts on the One Million Posts Corpus, balancing data to enable fair evaluation. The results show that context-enhanced LSTM and CNN models achieve competitive performance with state-of-the-art transformer approaches, while GPT-3.5-Turbo in zero-shot settings does not benefit from added context. The findings highlight the practical potential of context-aware neural models to reduce moderator workload and inform future work on data richness, model explainability, and fairness in moderation decisions.

Abstract

The increasing volume of online discussions requires advanced automatic content moderation to maintain responsible discourse. While hate speech detection on social media is well-studied, research on German-language newspaper forums remains limited. Existing studies often neglect platform-specific context, such as user history and article themes. This paper addresses this gap by developing and evaluating binary classification models for automatic content moderation in German newspaper forums, incorporating contextual information. Using LSTM, CNN, and ChatGPT-3.5 Turbo, and leveraging the One Million Posts Corpus from the Austrian newspaper Der Standard, we assess the impact of context-aware models. Results show that CNN and LSTM models benefit from contextual information and perform competitively with state-of-the-art approaches. In contrast, ChatGPT's zero-shot classification does not improve with added context and underperforms.

Context-Aware Content Moderation for German Newspaper Comments

TL;DR

This work tackles automatic content moderation for German newspaper comments by integrating contextual information such as article titles, topic paths, and user history. It systematically compares traditional shallow baselines, deep learning LSTM/CNN models, and GPT-3.5-Turbo prompts on the One Million Posts Corpus, balancing data to enable fair evaluation. The results show that context-enhanced LSTM and CNN models achieve competitive performance with state-of-the-art transformer approaches, while GPT-3.5-Turbo in zero-shot settings does not benefit from added context. The findings highlight the practical potential of context-aware neural models to reduce moderator workload and inform future work on data richness, model explainability, and fairness in moderation decisions.

Abstract

The increasing volume of online discussions requires advanced automatic content moderation to maintain responsible discourse. While hate speech detection on social media is well-studied, research on German-language newspaper forums remains limited. Existing studies often neglect platform-specific context, such as user history and article themes. This paper addresses this gap by developing and evaluating binary classification models for automatic content moderation in German newspaper forums, incorporating contextual information. Using LSTM, CNN, and ChatGPT-3.5 Turbo, and leveraging the One Million Posts Corpus from the Austrian newspaper Der Standard, we assess the impact of context-aware models. Results show that CNN and LSTM models benefit from contextual information and perform competitively with state-of-the-art approaches. In contrast, ChatGPT's zero-shot classification does not improve with added context and underperforms.

Paper Structure

This paper contains 23 sections, 1 equation, 2 figures, 6 tables.

Figures (2)

  • Figure 1: The different deep learning architectures.
  • Figure 2: Precision-Recall comparison of the different models.