Table of Contents
Fetching ...

Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension

Chenxu Wang, Ping Jian, Zhen Yang

TL;DR

This work tackles logical reading comprehension by addressing two key gaps: (1) CoT rationales traditionally analyze only correct options, and (2) counterfactual data are often produced via rule-based methods with limited diversity. It introduces Premise-Oriented Data Augmentation (PODA) to generate CoT rationales for both correct and incorrect options and to synthesize diverse counterfactual contexts from incorrect candidates, paired with Thought-Path Contrastive Learning (TPCL) that compares original and counterfactual reasoning paths. TPCL uses a Bradley-Terry-based similarity objective alongside a supervised fine-tuning loss to pull similar thought-paths together while separating dissimilar ones, promoting clearer distinction between options. Empirical results on ReClor and LogiQA 2.0 across open LLMs show consistent gains over baselines, with high-quality counterfactual data and CoT rationales outperforming rule-based and existing CoT approaches, indicating strong potential for improved logical reasoning in large language models.

Abstract

Logical reading comprehension is a challenging task that entails grasping the underlying semantics of text and applying reasoning to deduce the correct answer. Prior researches have primarily focused on enhancing logical reasoning capabilities through Chain-of-Thought (CoT) or data augmentation. However, previous work constructing chain-of-thought rationales concentrates solely on analyzing correct options, neglecting the incorrect alternatives. Addtionally, earlier efforts on data augmentation by altering contexts rely on rule-based methods, which result in generated contexts that lack diversity and coherence. To address these issues, we propose a Premise-Oriented Data Augmentation (PODA) framework. This framework can generate CoT rationales including analyses for both correct and incorrect options, while constructing diverse and high-quality counterfactual contexts from incorrect candidate options. We integrate summarizing premises and identifying premises for each option into rationales. Subsequently, we employ multi-step prompts with identified premises to construct counterfactual context. To facilitate the model's capabilities to better differentiate the reasoning process associated with each option, we introduce a novel thought-path contrastive learning method that compares reasoning paths between the original and counterfactual samples. Experimental results on three representative LLMs demonstrate that our method can improve the baselines substantially across two challenging logical reasoning benchmarks (ReClor and LogiQA 2.0). The data and code are released at https://github.com/lalalamdbf/TPReasoner.

Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension

TL;DR

This work tackles logical reading comprehension by addressing two key gaps: (1) CoT rationales traditionally analyze only correct options, and (2) counterfactual data are often produced via rule-based methods with limited diversity. It introduces Premise-Oriented Data Augmentation (PODA) to generate CoT rationales for both correct and incorrect options and to synthesize diverse counterfactual contexts from incorrect candidates, paired with Thought-Path Contrastive Learning (TPCL) that compares original and counterfactual reasoning paths. TPCL uses a Bradley-Terry-based similarity objective alongside a supervised fine-tuning loss to pull similar thought-paths together while separating dissimilar ones, promoting clearer distinction between options. Empirical results on ReClor and LogiQA 2.0 across open LLMs show consistent gains over baselines, with high-quality counterfactual data and CoT rationales outperforming rule-based and existing CoT approaches, indicating strong potential for improved logical reasoning in large language models.

Abstract

Logical reading comprehension is a challenging task that entails grasping the underlying semantics of text and applying reasoning to deduce the correct answer. Prior researches have primarily focused on enhancing logical reasoning capabilities through Chain-of-Thought (CoT) or data augmentation. However, previous work constructing chain-of-thought rationales concentrates solely on analyzing correct options, neglecting the incorrect alternatives. Addtionally, earlier efforts on data augmentation by altering contexts rely on rule-based methods, which result in generated contexts that lack diversity and coherence. To address these issues, we propose a Premise-Oriented Data Augmentation (PODA) framework. This framework can generate CoT rationales including analyses for both correct and incorrect options, while constructing diverse and high-quality counterfactual contexts from incorrect candidate options. We integrate summarizing premises and identifying premises for each option into rationales. Subsequently, we employ multi-step prompts with identified premises to construct counterfactual context. To facilitate the model's capabilities to better differentiate the reasoning process associated with each option, we introduce a novel thought-path contrastive learning method that compares reasoning paths between the original and counterfactual samples. Experimental results on three representative LLMs demonstrate that our method can improve the baselines substantially across two challenging logical reasoning benchmarks (ReClor and LogiQA 2.0). The data and code are released at https://github.com/lalalamdbf/TPReasoner.
Paper Structure (32 sections, 6 equations, 15 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 6 equations, 15 figures, 8 tables, 1 algorithm.

Figures (15)

  • Figure 1: Generating counterfactual context from an incorrect candidate option.
  • Figure 1: The trend of similarity variation for both similar and dissimilar thought-path pairs in TPCL.
  • Figure 2: The overall architecture of our method. (1) PODA annotates Chain-of-Thought (CoT) rationales and generates counterfactual logical reasoning data. (2) The original and counterfactual samples are used for thought-path contrastive learning.
  • Figure 2: Example of the prompt used to annotate the CoT rationale, which involves the analyses for both correct and wrong options. The rationale also integrates summarizing premises and identifying premises for each option, forming the foundation for constructing new counterfactual instances.
  • Figure 3: Example of the prompt used to generate counterfactual premises.
  • ...and 10 more figures