Table of Contents
Fetching ...

Are Large Language Models Good at Detecting Propaganda?

Julia Jose, Rachel Greenstadt

TL;DR

This work tackles propaganda-detection in news by evaluating GPT-3.5, GPT-4, and Claude 3 Opus across six propaganda techniques from the Propaganda Techniques Corpus, and benchmarking against RoBERTa-CRF and Multi-Granularity Network baselines. Across zero-shot to self-consistency prompting, the study finds that LLMs generally do not outperform the RoBERTa-CRF baseline, though GPT-4 can outperform MGN in macro-F1 under some prompts and outperforms GPT-3.5 and Claude 3 Opus on certain techniques. Name-calling is detected more effectively by all LLMs than MGN, while other techniques like loaded language, doubt, and exaggeration remain challenging. The results highlight that strong transformer-based baselines still surpass LLMs in propaganda-technique detection and point to future work in prompting, fine-tuning, and data quality to close the gap. The findings underscore the continued value of robust, ensemble baselines for nuanced text-analytic tasks in the presence of subtle rhetorical devices.

Abstract

Propagandists use rhetorical devices that rely on logical fallacies and emotional appeals to advance their agendas. Recognizing these techniques is key to making informed decisions. Recent advances in Natural Language Processing (NLP) have enabled the development of systems capable of detecting manipulative content. In this study, we look at several Large Language Models and their performance in detecting propaganda techniques in news articles. We compare the performance of these LLMs with transformer-based models. We find that, while GPT-4 demonstrates superior F1 scores (F1=0.16) compared to GPT-3.5 and Claude 3 Opus, it does not outperform a RoBERTa-CRF baseline (F1=0.67). Additionally, we find that all three LLMs outperform a MultiGranularity Network (MGN) baseline in detecting instances of one out of six propaganda techniques (name-calling), with GPT-3.5 and GPT-4 also outperforming the MGN baseline in detecting instances of appeal to fear and flag-waving.

Are Large Language Models Good at Detecting Propaganda?

TL;DR

This work tackles propaganda-detection in news by evaluating GPT-3.5, GPT-4, and Claude 3 Opus across six propaganda techniques from the Propaganda Techniques Corpus, and benchmarking against RoBERTa-CRF and Multi-Granularity Network baselines. Across zero-shot to self-consistency prompting, the study finds that LLMs generally do not outperform the RoBERTa-CRF baseline, though GPT-4 can outperform MGN in macro-F1 under some prompts and outperforms GPT-3.5 and Claude 3 Opus on certain techniques. Name-calling is detected more effectively by all LLMs than MGN, while other techniques like loaded language, doubt, and exaggeration remain challenging. The results highlight that strong transformer-based baselines still surpass LLMs in propaganda-technique detection and point to future work in prompting, fine-tuning, and data quality to close the gap. The findings underscore the continued value of robust, ensemble baselines for nuanced text-analytic tasks in the presence of subtle rhetorical devices.

Abstract

Propagandists use rhetorical devices that rely on logical fallacies and emotional appeals to advance their agendas. Recognizing these techniques is key to making informed decisions. Recent advances in Natural Language Processing (NLP) have enabled the development of systems capable of detecting manipulative content. In this study, we look at several Large Language Models and their performance in detecting propaganda techniques in news articles. We compare the performance of these LLMs with transformer-based models. We find that, while GPT-4 demonstrates superior F1 scores (F1=0.16) compared to GPT-3.5 and Claude 3 Opus, it does not outperform a RoBERTa-CRF baseline (F1=0.67). Additionally, we find that all three LLMs outperform a MultiGranularity Network (MGN) baseline in detecting instances of one out of six propaganda techniques (name-calling), with GPT-3.5 and GPT-4 also outperforming the MGN baseline in detecting instances of appeal to fear and flag-waving.

Paper Structure

This paper contains 11 sections, 7 tables.