Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities
Wenyue Hua, Kaijie Zhu, Lingyao Li, Lizhou Fan, Shuhang Lin, Mingyu Jin, Haochen Xue, Zelong Li, JinDong Wang, Yongfeng Zhang
TL;DR
ContextHub addresses whether large language models truly reason or rely on contextual cues by pairing abstract and contextualized instantiations of the same propositional logic templates. The authors construct a scalable benchmark with 4 difficulty levels, 12 domains plus an abstract domain, and rigorous quality control to study context effects and generalization. Key findings show that model size interacts with context, with large models excelling on abstract logic while contextualized data can substantially boost fine-tuning generalization, though highly complex tasks challenge contextualized approaches. The work provides a flexible, domain-aware framework for evaluating and improving reasoning in LLMs and highlights instantiated data as a powerful resource for generalization in practice.
Abstract
This study intends to systematically disentangle pure logic reasoning and text understanding by investigating the contrast across abstract and contextualized logical problems from a comprehensive set of domains. We explore whether LLMs demonstrate genuine reasoning capabilities across various domains when the underlying logical structure remains constant. We focus on two main questions (1) Can abstract logical problems alone accurately benchmark an LLM's reasoning ability in real-world scenarios, disentangled from contextual support in practical settings? (2) Does fine-tuning LLMs on abstract logic problem generalize to contextualized logic problems and vice versa? To investigate these questions, we focus on standard propositional logic, specifically propositional deductive and abductive logic reasoning. In particular, we construct instantiated datasets for deductive and abductive reasoning with 4 levels of difficulty, encompassing 12 distinct categories or domains based on the categorization of Wikipedia. Our experiments aim to provide insights into disentangling context in logical reasoning and the true reasoning capabilities of LLMs and their generalization potential. The code and dataset are available at: https://github.com/agiresearch/ContextHub.
