Table of Contents
Fetching ...

Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation

Jiwon Jeong, Hyeju Jang, Hogun Park

TL;DR

This paper tackles the difficulty of detecting logical fallacies in natural language by introducing a prompt-engineered approach that injects implicit contextual information—Counterargument, Explanation, and Goal—into LLM prompts. It generates context-aware queries, ranks them by confidence, and uses this ranking to inform final classification, enabling both zero-shot and fine-tuned performance gains across five diverse fallacy datasets (29 types). The method yields substantial improvements over state-of-the-art baselines, with Macro-F1 gains up to 0.60 in zero-shot and up to 0.45 in supervised settings, and offers extensive analyses on calibration, query importance, and robustness. The work demonstrates that structured, multi-perspective prompts with confidence-based ranking can significantly enhance logical reasoning in LLMs and provides code for reproducibility and broader applicability.

Abstract

The advancement of Large Language Models (LLMs) has greatly improved our ability to process complex language. However, accurately detecting logical fallacies remains a significant challenge. This study presents a novel and effective prompt formulation approach for logical fallacy detection, applicable in both supervised (fine-tuned) and unsupervised (zero-shot) settings. Our method enriches input text incorporating implicit contextual information -- counterarguments, explanations, and goals -- which we query for validity within the context of the argument. We then rank these queries based on confidence scores to inform classification. We evaluate our approach across multiple datasets from 5 domains, covering 29 distinct fallacy types, using models from the GPT and LLaMA series. The results show substantial improvements over state-of-the-art models, with F1 score increases of up to 0.60 in zero-shot settings and up to 0.45 in fine-tuned models. Extensive analyses further illustrate why and how our method excels.

Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation

TL;DR

This paper tackles the difficulty of detecting logical fallacies in natural language by introducing a prompt-engineered approach that injects implicit contextual information—Counterargument, Explanation, and Goal—into LLM prompts. It generates context-aware queries, ranks them by confidence, and uses this ranking to inform final classification, enabling both zero-shot and fine-tuned performance gains across five diverse fallacy datasets (29 types). The method yields substantial improvements over state-of-the-art baselines, with Macro-F1 gains up to 0.60 in zero-shot and up to 0.45 in supervised settings, and offers extensive analyses on calibration, query importance, and robustness. The work demonstrates that structured, multi-perspective prompts with confidence-based ranking can significantly enhance logical reasoning in LLMs and provides code for reproducibility and broader applicability.

Abstract

The advancement of Large Language Models (LLMs) has greatly improved our ability to process complex language. However, accurately detecting logical fallacies remains a significant challenge. This study presents a novel and effective prompt formulation approach for logical fallacy detection, applicable in both supervised (fine-tuned) and unsupervised (zero-shot) settings. Our method enriches input text incorporating implicit contextual information -- counterarguments, explanations, and goals -- which we query for validity within the context of the argument. We then rank these queries based on confidence scores to inform classification. We evaluate our approach across multiple datasets from 5 domains, covering 29 distinct fallacy types, using models from the GPT and LLaMA series. The results show substantial improvements over state-of-the-art models, with F1 score increases of up to 0.60 in zero-shot settings and up to 0.45 in fine-tuned models. Extensive analyses further illustrate why and how our method excels.

Paper Structure

This paper contains 40 sections, 4 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: Prompt formulation: $x$ represents the input text to classify. $\mathcal{R}_i$ denotes the contextual augmentation generated from the input text using specific instructions for Counterargument (CG), Explanation (EX), and Goal (GO). $\mathcal{Q}_i$ denotes the reformulated queries created from each augmentation to analyze the input text.
  • Figure 2: Multi-class classification results based on query types for all datasets using gpt-3.5-turbo, gpt-4, and roberta-base from top to bottom. CG: Counterargument, EX: Explanation, GO: Goal, and PR: Prompt Ranking. Base: the method that uses only logical fallacy sentences without any queries.
  • Figure 3: Performance comparison of base (without queries) and all query types across various fallacy classes using the gpt-3.5-turbo model. The y-axis represents the average rank of each method across datasets. Lower ranks indicate better performance. BWF: Black and White Fallacy, RAH: Reductio Ad Hitlerum, T.T. Cliches: Thought Terminating Cliches.
  • Figure 4: Relationship between confidence scores and performance with/without queries for two datasets using the gpt-3.5-turbo model.
  • Figure 5: Reliability diagrams comparing the calibration of the base method (without queries) and ours (prompt ranking) using the gpt-3.5-turbo model across two datasets.
  • ...and 8 more figures