A Few Hypocrites: Few-Shot Learning and Subtype Definitions for Detecting Hypocrisy Accusations in Online Climate Change Debates

Paulina Garcia Corral; Avishai Green; Hendrik Meyer; Anke Stoll; Xiaoyue Yan; Myrthe Reuver

A Few Hypocrites: Few-Shot Learning and Subtype Definitions for Detecting Hypocrisy Accusations in Online Climate Change Debates

Paulina Garcia Corral, Avishai Green, Hendrik Meyer, Anke Stoll, Xiaoyue Yan, Myrthe Reuver

TL;DR

This work treats hypocrisy accusation detection as its own NLP task within online climate discourse and introduces the Climate Hypocrisy Accusation Corpus (CHAC), a 420-comment dataset annotated by experts into personal moral hypocrisy and political hypocrisy. Through six-shot in-context learning across GPT-4o, GPT-3.5, and Llama-3, the study shows that newer instruction-tuned models achieve meaningful detection performance (macro-F1 ≈ 0.67–0.68), with personal hypocrisy easier to identify than political hypocrisy. The paper provides a careful error analysis, revealing systematic challenges such as false positives driven by mentions of 'hypocrisy', false negatives for older models, and subtype misclassification, especially for political content. By releasing CHAC and detailing an annotation scheme and experimental protocol, the work enables scalable, domain-specific analysis of hypocrisy in climate debates and highlights directions for future improvement and broader applicability in social science text analysis.

Abstract

The climate crisis is a salient issue in online discussions, and hypocrisy accusations are a central rhetorical element in these debates. However, for large-scale text analysis, hypocrisy accusation detection is an understudied tool, most often defined as a smaller subtask of fallacious argument detection. In this paper, we define hypocrisy accusation detection as an independent task in NLP, and identify different relevant subtypes of hypocrisy accusations. Our Climate Hypocrisy Accusation Corpus (CHAC) consists of 420 Reddit climate debate comments, expert-annotated into two different types of hypocrisy accusations: personal versus political hypocrisy. We evaluate few-shot in-context learning with 6 shots and 3 instruction-tuned Large Language Models (LLMs) for detecting hypocrisy accusations in this dataset. Results indicate that the GPT-4o and Llama-3 models in particular show promise in detecting hypocrisy accusations (F1 reaching 0.68, while previous work shows F1 of 0.44). However, context matters for a complex semantic concept such as hypocrisy accusations, and we find models struggle especially at identifying political hypocrisy accusations compared to personal moral hypocrisy. Our study contributes new insights in hypocrisy detection and climate change discourse, and is a stepping stone for large-scale analysis of hypocrisy accusation in online climate debates.

A Few Hypocrites: Few-Shot Learning and Subtype Definitions for Detecting Hypocrisy Accusations in Online Climate Change Debates

TL;DR

Abstract

Paper Structure (32 sections, 5 figures, 6 tables)

This paper contains 32 sections, 5 figures, 6 tables.

Introduction
Background
Promise and Limits of LLMs for Social Science Construct Detection
Fallacies and Hypocrisy Accusations
Data
Data Sample and Annotation Process
Annotation Scheme
Climate Hypocrisy Accusations Corpus
Experimental Approach
Model Selection
Prompt and Shot Selection
Results
Overall Results
Sub-class Prediction
Error Analysis
...and 17 more sections

Figures (5)

Figure 1: Bar graph comparing result metrics of LLM performance, from left to right we see LLama-3 (blue), GPT-3.5 (orange), and GPT-4o (green), grouped by accuracy (first group), precision (second group), recall (third group), and F1-score (last group).
Figure 2: Bar graph comparing prediction and real labels distribution: from left to right we see CHAC dataset (blue), GPT-4o (orange), GPT-3.5 (green) and Llama-3 (red) grouped by class label: PMH (first group), PH (second group), Neither (third group), and No accusation (last group).
Figure 3: Confusion Matrix for predictions of GPT-4o .
Figure 4: Confusion Matrix for predictions of GPT-3.5.
Figure 5: Confusion Matrix for predictions of Llama-3 70B.

A Few Hypocrites: Few-Shot Learning and Subtype Definitions for Detecting Hypocrisy Accusations in Online Climate Change Debates

TL;DR

Abstract

A Few Hypocrites: Few-Shot Learning and Subtype Definitions for Detecting Hypocrisy Accusations in Online Climate Change Debates

Authors

TL;DR

Abstract

Table of Contents

Figures (5)