Table of Contents
Fetching ...

PclGPT: A Large Language Model for Patronizing and Condescending Language Detection

Hongbo Wang, Mingda Li, Junyu Lu, Hebin Xia, Liang Yang, Bo Xu, Ruizhu Liu, Hongfei Lin

TL;DR

PclGPT is introduced, a comprehensive LLM benchmark designed specifically for PCL, and a bilingual PclGPT-EN/CN model group is developed through a comprehensive pre-training and supervised fine-tuning staircase process to facilitate implicit toxic detection.

Abstract

Disclaimer: Samples in this paper may be harmful and cause discomfort! Patronizing and condescending language (PCL) is a form of speech directed at vulnerable groups. As an essential branch of toxic language, this type of language exacerbates conflicts and confrontations among Internet communities and detrimentally impacts disadvantaged groups. Traditional pre-trained language models (PLMs) perform poorly in detecting PCL due to its implicit toxicity traits like hypocrisy and false sympathy. With the rise of large language models (LLMs), we can harness their rich emotional semantics to establish a paradigm for exploring implicit toxicity. In this paper, we introduce PclGPT, a comprehensive LLM benchmark designed specifically for PCL. We collect, annotate, and integrate the Pcl-PT/SFT dataset, and then develop a bilingual PclGPT-EN/CN model group through a comprehensive pre-training and supervised fine-tuning staircase process to facilitate implicit toxic detection. Group detection results and fine-grained detection from PclGPT and other models reveal significant variations in the degree of bias in PCL towards different vulnerable groups, necessitating increased societal attention to protect them.

PclGPT: A Large Language Model for Patronizing and Condescending Language Detection

TL;DR

PclGPT is introduced, a comprehensive LLM benchmark designed specifically for PCL, and a bilingual PclGPT-EN/CN model group is developed through a comprehensive pre-training and supervised fine-tuning staircase process to facilitate implicit toxic detection.

Abstract

Disclaimer: Samples in this paper may be harmful and cause discomfort! Patronizing and condescending language (PCL) is a form of speech directed at vulnerable groups. As an essential branch of toxic language, this type of language exacerbates conflicts and confrontations among Internet communities and detrimentally impacts disadvantaged groups. Traditional pre-trained language models (PLMs) perform poorly in detecting PCL due to its implicit toxicity traits like hypocrisy and false sympathy. With the rise of large language models (LLMs), we can harness their rich emotional semantics to establish a paradigm for exploring implicit toxicity. In this paper, we introduce PclGPT, a comprehensive LLM benchmark designed specifically for PCL. We collect, annotate, and integrate the Pcl-PT/SFT dataset, and then develop a bilingual PclGPT-EN/CN model group through a comprehensive pre-training and supervised fine-tuning staircase process to facilitate implicit toxic detection. Group detection results and fine-grained detection from PclGPT and other models reveal significant variations in the degree of bias in PCL towards different vulnerable groups, necessitating increased societal attention to protect them.
Paper Structure (23 sections, 6 figures, 10 tables)

This paper contains 23 sections, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Scatter plots for the scores using the Perspective API PerspectiveAPI on the hate and PCL datasets. The left plot shows the English datasets SemEval-19 (HATE) and SemEval-22 (PCL), while the right plot shows the Chinese datasets COLD (HATE) and CCPC (PCL). The toxicity score ranges from 0 to 1, with increasing toxicity as discrete values.
  • Figure 2: An illustration of the overall PclGPT. We establish Pcl-PT/SFT datasets and build a bilingual model group through pre-training and SFT. Instruction Data Format demonstrates the data construction format for SFT.
  • Figure 3: A template for SFT instructions, including definitions of PCL and its subcategories, as well as toxicity intensity.
  • Figure 5: Word cloud statistics of the condescending dictionary.
  • Figure 6: Toxicity score scatter plots for three PCL datasets.
  • ...and 1 more figures