Are LLMs Rigorous Logical Reasoners? Empowering Natural Language Proof Generation by Stepwise Decoding with Contrastive Learning

Ying Su; Mingwen Liu; Zhijiang Guo

Are LLMs Rigorous Logical Reasoners? Empowering Natural Language Proof Generation by Stepwise Decoding with Contrastive Learning

Ying Su, Mingwen Liu, Zhijiang Guo

TL;DR

This work investigates whether large language models can serve as rigorous logical reasoners when generating natural-language proofs. It introduces ConDec, a stepwise decoding framework that employs contrastive learning and two types of hard negatives (vanilla and enhanced) to correct decoding errors such as repetition and invalid entailment. Empirical evaluation on EntailmentBank shows that ConDec achieves strong performance, comparable to or surpassing several three-stage methods, while also reducing inference costs. The study also reveals that even larger LLMs still struggle with producing fully rigorous proof chains, underscoring the value of decoding-focused training for trustworthy natural-language proofs.

Abstract

Logical reasoning is a pivotal component in the field of artificial intelligence. Proof planning, particularly in contexts requiring the validation of explanation accuracy, continues to present challenges. The recent advancement of large language models (LLMs) has led to significant progress in natural language proof planning, evolving from one-stage generators to more complex three-stage systems that include additional searchers or verifiers. While these assisted methods improve the quality of generated results, they also introduce increased search efforts and computational costs. Furthermore, the generative process itself remains underexplored. In this study, we propose a stepwise decoding approach augmented by contrastive learning to address two common errors encountered during the LLM generator's decoding process. We fine-tune the language model using both vanilla and enhanced hard negatives to mitigate these decoding errors. Empirical results demonstrate the effectiveness of our strategy. Additionally, our further analysis reveals that even larger LLMs still struggle to generate rigorous logical chains.

Are LLMs Rigorous Logical Reasoners? Empowering Natural Language Proof Generation by Stepwise Decoding with Contrastive Learning

TL;DR

Abstract

Paper Structure (29 sections, 10 equations, 2 figures, 11 tables)

This paper contains 29 sections, 10 equations, 2 figures, 11 tables.

Introduction
Related Work
Logical Reasoning with Natural Language
Proof Generation
Problem Formulation
Approach
Stepwise Decoding with Contrastive Learning
Training with Hard Negative
Vanilla Hard Negative
Enhanced Hard Negative
Experiment
Dataset
Evaluation
Implementation Details
Result Analysis
...and 14 more sections

Figures (2)

Figure 1: Architecture of the stepwise decoding with contrastive learning over hard negatives. The hard negatives are constructed by vanilla and enhanced strategies. Vanilla strategy means simple conclusion substitution. The enhanced strategy uses a reasoner and a checker to generate hard negatives.
Figure 2: Enhanced hard negative construction is implemented by exploring the unseen combination of premises, inferencing with the reasoner, and filtering with a $score$ from the checker.

Are LLMs Rigorous Logical Reasoners? Empowering Natural Language Proof Generation by Stepwise Decoding with Contrastive Learning

TL;DR

Abstract

Are LLMs Rigorous Logical Reasoners? Empowering Natural Language Proof Generation by Stepwise Decoding with Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)