Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension
Chenxu Wang, Ping Jian, Zhen Yang
TL;DR
This work tackles logical reading comprehension by addressing two key gaps: (1) CoT rationales traditionally analyze only correct options, and (2) counterfactual data are often produced via rule-based methods with limited diversity. It introduces Premise-Oriented Data Augmentation (PODA) to generate CoT rationales for both correct and incorrect options and to synthesize diverse counterfactual contexts from incorrect candidates, paired with Thought-Path Contrastive Learning (TPCL) that compares original and counterfactual reasoning paths. TPCL uses a Bradley-Terry-based similarity objective alongside a supervised fine-tuning loss to pull similar thought-paths together while separating dissimilar ones, promoting clearer distinction between options. Empirical results on ReClor and LogiQA 2.0 across open LLMs show consistent gains over baselines, with high-quality counterfactual data and CoT rationales outperforming rule-based and existing CoT approaches, indicating strong potential for improved logical reasoning in large language models.
Abstract
Logical reading comprehension is a challenging task that entails grasping the underlying semantics of text and applying reasoning to deduce the correct answer. Prior researches have primarily focused on enhancing logical reasoning capabilities through Chain-of-Thought (CoT) or data augmentation. However, previous work constructing chain-of-thought rationales concentrates solely on analyzing correct options, neglecting the incorrect alternatives. Addtionally, earlier efforts on data augmentation by altering contexts rely on rule-based methods, which result in generated contexts that lack diversity and coherence. To address these issues, we propose a Premise-Oriented Data Augmentation (PODA) framework. This framework can generate CoT rationales including analyses for both correct and incorrect options, while constructing diverse and high-quality counterfactual contexts from incorrect candidate options. We integrate summarizing premises and identifying premises for each option into rationales. Subsequently, we employ multi-step prompts with identified premises to construct counterfactual context. To facilitate the model's capabilities to better differentiate the reasoning process associated with each option, we introduce a novel thought-path contrastive learning method that compares reasoning paths between the original and counterfactual samples. Experimental results on three representative LLMs demonstrate that our method can improve the baselines substantially across two challenging logical reasoning benchmarks (ReClor and LogiQA 2.0). The data and code are released at https://github.com/lalalamdbf/TPReasoner.
