Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

Xufeng Zhao; Mengdi Li; Wenhao Lu; Cornelius Weber; Jae Hee Lee; Kun Chu; Stefan Wermter

Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

Xufeng Zhao, Mengdi Li, Wenhao Lu, Cornelius Weber, Jae Hee Lee, Kun Chu, Stefan Wermter

TL;DR

Aiming at improving the zero-shot chain-of-thought reasoning ability of large language models, LoT (Logical Thoughts), a self-improvement prompting framework that leverages principles rooted in symbolic logic, particularly Reductio ad Absurdum to systematically verify and rectify the reasoning processes step by step is proposed.

Abstract

Recent advancements in large language models have showcased their remarkable generalizability across various domains. However, their reasoning abilities still have significant room for improvement, especially when confronted with scenarios requiring multi-step reasoning. Although large language models possess extensive knowledge, their reasoning often fails to effectively utilize this knowledge to establish a coherent thinking paradigm. These models sometimes show hallucinations as their reasoning procedures are unconstrained by logical principles. Aiming at improving the zero-shot chain-of-thought reasoning ability of large language models, we propose LoT (Logical Thoughts), a self-improvement prompting framework that leverages principles rooted in symbolic logic, particularly Reductio ad Absurdum, to systematically verify and rectify the reasoning processes step by step. Experimental evaluations conducted on language tasks in diverse domains, including arithmetic, commonsense, symbolic, causal inference, and social problems, demonstrate the efficacy of enhanced reasoning by logic. The implementation code for LoT can be accessed at: https://github.com/xf-zhao/LoT.

Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

TL;DR

Abstract

Paper Structure (30 sections, 2 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 30 sections, 2 equations, 4 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Chain-of-Thought Prompting
Variational Prompting
Neurosymbolic Models
Method
Reductio ad Absurdum
LoT Prompting
Chain Growth
Experiments
Experimental Setup
Does LoT enhance the performance of CoT?
What is the impact on individual reasoning chains?
Do post-hoc explanations help LLM self-check?
Conclusion and Future Work
...and 15 more sections

Figures (4)

Figure 1: An overview of CoT (chain-of-thought prompting, Wei22ChainofThoughtPrompting) and LoT (ours). In CoT, the failure of entailment () makes the rest of the deduction untrustworthy (), impeding the overall success of the deduction. In contrast, LoT is designed to think-verify-revise: it adopts those who pass the verification () and revises () those who do not, thereby effectively improving the overall reasoning capability.
Figure 2: A diagram demonstrating the think-verify-revision loop of LoT. The two zoom-in boxes display the processes when a thought passes (top-left) and fails (bottom) the verification respectively. A thought passing the verification is kept in the reasoning trace, while a thought failing the verification is revised and a new chain of thought is generated based on the revision. The symbols in this figure are introduced in Sec. \ref{['sec:LoT']} and Sec. \ref{['sec:chain-growth']}. See also Fig. \ref{['fig:example']} in Appendix \ref{['sec:appendix_detail']} with extended details.
Figure 3: An example conversation with ChatGPT where the language model fails to correctly deduce the answer initially, but when being prompted to use the idea of "contraposition", it successfully reaches the desired result.
Figure 4: Applying LoT verification and revision on CoT reasoning paths on the example of an arithmetic task. Every reasoning step has to undergo a verification procedure, which is guided by two post hoc reviews generated by the LLM () independently. In this example, step #1 fails () the verification because the discriminator agrees with the "Review Y" which correctly points out the error in this step. As a result, the LLM further revises () the original step into a new step #1 and re-generates the trailing paths based on the revision. The procedure unrolls until every step is verified to be valid (). Key snippets of prompts used to achieve each procedure are shown in dotted boxes. Full prompts are given in the case study in Sec. \ref{['sec:q3']} and Appendix \ref{['sec:appendix_cases']}.

Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

TL;DR

Abstract

Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

Authors

TL;DR

Abstract

Table of Contents

Figures (4)