Table of Contents
Fetching ...

Reason from Fallacy: Enhancing Large Language Models' Logical Reasoning through Logical Fallacy Understanding

Yanda Li, Dixuan Wang, Jiaqing Liang, Guochao Jiang, Qianyu He, Yanghua Xiao, Deqing Yang

TL;DR

This work tackles the gap in large language models' logical reasoning by focusing on logical fallacy understanding (LFU). It introduces LFUD, a GPT-4–driven dataset containing 4,020 LFU-focused QA instances across 12 fallacy types derived from 67 propositions, designed around five LFU tasks spanning WHAT, WHY, and HOW. Through fine-tuning experiments on LFUD (vs. baselines like LOGIC), the authors demonstrate significant improvements in LFU and general logical reasoning across multiple benchmarks, with clear evidence of cross-task transfer to Task5. The approach emphasizes data-driven LFU augmentation, robust validation, and insights into how task composition and fallacy diversity impact LFU learning, offering a practical resource for advancing causal, trustworthy reasoning in LLMs.

Abstract

Large Language Models (LLMs) have demonstrated good performance in many reasoning tasks, but they still struggle with some complicated reasoning tasks including logical reasoning. One non-negligible reason for LLMs' suboptimal performance on logical reasoning is their overlooking of understanding logical fallacies correctly. To evaluate LLMs' capability of logical fallacy understanding (LFU), we propose five concrete tasks from three cognitive dimensions of WHAT, WHY, and HOW in this paper. Towards these LFU tasks, we have successfully constructed a new dataset LFUD based on GPT-4 accompanied by a little human effort. Our extensive experiments justify that our LFUD can be used not only to evaluate LLMs' LFU capability, but also to fine-tune LLMs to obtain significantly enhanced performance on logical reasoning.

Reason from Fallacy: Enhancing Large Language Models' Logical Reasoning through Logical Fallacy Understanding

TL;DR

This work tackles the gap in large language models' logical reasoning by focusing on logical fallacy understanding (LFU). It introduces LFUD, a GPT-4–driven dataset containing 4,020 LFU-focused QA instances across 12 fallacy types derived from 67 propositions, designed around five LFU tasks spanning WHAT, WHY, and HOW. Through fine-tuning experiments on LFUD (vs. baselines like LOGIC), the authors demonstrate significant improvements in LFU and general logical reasoning across multiple benchmarks, with clear evidence of cross-task transfer to Task5. The approach emphasizes data-driven LFU augmentation, robust validation, and insights into how task composition and fallacy diversity impact LFU learning, offering a practical resource for advancing causal, trustworthy reasoning in LLMs.

Abstract

Large Language Models (LLMs) have demonstrated good performance in many reasoning tasks, but they still struggle with some complicated reasoning tasks including logical reasoning. One non-negligible reason for LLMs' suboptimal performance on logical reasoning is their overlooking of understanding logical fallacies correctly. To evaluate LLMs' capability of logical fallacy understanding (LFU), we propose five concrete tasks from three cognitive dimensions of WHAT, WHY, and HOW in this paper. Towards these LFU tasks, we have successfully constructed a new dataset LFUD based on GPT-4 accompanied by a little human effort. Our extensive experiments justify that our LFUD can be used not only to evaluate LLMs' LFU capability, but also to fine-tune LLMs to obtain significantly enhanced performance on logical reasoning.
Paper Structure (32 sections, 4 figures, 10 tables)

This paper contains 32 sections, 4 figures, 10 tables.

Figures (4)

  • Figure 1: LLMs have deficiencies in logical reasoning. Once they understand logical fallacies, they know how to avoid logical fallacies, and thus improve their performance in various logical reasoning tasks.
  • Figure 2: Our framework of constructing LFUD and fine-tuning LLMs with LFUD to enhance logical reasoning. At first, we collected some propositions, based on which the sentences with the logical fallacies of 12 types were generated by GPT-4. Then, for the five LFU tasks we proposed, the QA instances were synthesized based on the previous generated sentences. Finally, we fine-tuned LLMs with LFUD, revealing that fine-tuning LLMs with LFUD can significantly enhance their logical reasoning capability.
  • Figure 3: LLaMA2-13B's performance on the four logical reasoning tasks with different scales of LFUD training samples.
  • Figure 4: LLMs' Performance on Task 5 without fine-tuning (denoted as Original) or after being fine-tuned with training data of Task 1--4.