Table of Contents
Fetching ...

Make Satire Boring Again: Reducing Stylistic Bias of Satirical Corpus by Utilizing Generative LLMs

Asli Umay Ozturk, Recep Firat Cekinel, Pinar Karagoz

TL;DR

The paper tackles stylistic bias in satirical text datasets that impairs satire detection across languages and domains. It introduces a debiasing pipeline that uses generative LLMs and prompt engineering to produce stylistically neutral satirical articles, demonstrated on a Turkish satirical news dataset. Experiments across multiple models reveal that debiasing can improve cross-lingual and cross-domain robustness (e.g., irony detection and English satire) but may reduce within-domain performance, with variable gains depending on model pretraining exposure. The work also provides a Turkish dataset with human annotations and emphasize human-in-the-loop evaluation and ethical considerations around synthetic data generation for NLP tasks.

Abstract

Satire detection is essential for accurately extracting opinions from textual data and combating misinformation online. However, the lack of diverse corpora for satire leads to the problem of stylistic bias which impacts the models' detection performances. This study proposes a debiasing approach for satire detection, focusing on reducing biases in training data by utilizing generative large language models. The approach is evaluated in both cross-domain (irony detection) and cross-lingual (English) settings. Results show that the debiasing method enhances the robustness and generalizability of the models for satire and irony detection tasks in Turkish and English. However, its impact on causal language models, such as Llama-3.1, is limited. Additionally, this work curates and presents the Turkish Satirical News Dataset with detailed human annotations, with case studies on classification, debiasing, and explainability.

Make Satire Boring Again: Reducing Stylistic Bias of Satirical Corpus by Utilizing Generative LLMs

TL;DR

The paper tackles stylistic bias in satirical text datasets that impairs satire detection across languages and domains. It introduces a debiasing pipeline that uses generative LLMs and prompt engineering to produce stylistically neutral satirical articles, demonstrated on a Turkish satirical news dataset. Experiments across multiple models reveal that debiasing can improve cross-lingual and cross-domain robustness (e.g., irony detection and English satire) but may reduce within-domain performance, with variable gains depending on model pretraining exposure. The work also provides a Turkish dataset with human annotations and emphasize human-in-the-loop evaluation and ethical considerations around synthetic data generation for NLP tasks.

Abstract

Satire detection is essential for accurately extracting opinions from textual data and combating misinformation online. However, the lack of diverse corpora for satire leads to the problem of stylistic bias which impacts the models' detection performances. This study proposes a debiasing approach for satire detection, focusing on reducing biases in training data by utilizing generative large language models. The approach is evaluated in both cross-domain (irony detection) and cross-lingual (English) settings. Results show that the debiasing method enhances the robustness and generalizability of the models for satire and irony detection tasks in Turkish and English. However, its impact on causal language models, such as Llama-3.1, is limited. Additionally, this work curates and presents the Turkish Satirical News Dataset with detailed human annotations, with case studies on classification, debiasing, and explainability.

Paper Structure

This paper contains 36 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The proposed debiasing pipeline
  • Figure 2: Debiasing of Sample Article (1) with Prompt 1
  • Figure 3: Debiasing of Sample Article (2) with Prompt 1 and Prompt 2
  • Figure 4: Training setups in the experiments
  • Figure 5: Human annotation and SHAP annotation for Sample Article (1)
  • ...and 2 more figures