LogicPrpBank: A Corpus for Logical Implication and Equivalence
Zhexiong Liu, Jing Zhang, Jiaying Lu, Wenjing Ma, Joyce C Ho
TL;DR
LogicPrpBank addresses the lack of labeled resources for propositional logic reasoning in mathematics by introducing a labeled corpus of $7093$ PLSs across six subjects. It combines ChatGPT-generated atomic PLSs with a template-based proposition composer to generate compound PLSs and labels their truth values, then benchmarks a spectrum of LMs from small to large on $P \rightarrow Q$ and $P \leftrightarrow Q$ reasoning. The study finds that subject complexity matters: small- to medium-scale LMs excel in calculus, geometry, and statistics but struggle in arithmetic and number theory, and increasing model size does not guarantee better propositional-logic reasoning, with 5-shot prompting often performing best. This work provides a resource and baseline insights for educational intelligent tutoring systems and future interdisciplinary reasoning tasks, and highlights the need for careful dataset construction when evaluating logic-focused capabilities.
Abstract
Logic reasoning has been critically needed in problem-solving and decision-making. Although Language Models (LMs) have demonstrated capabilities of handling multiple reasoning tasks (e.g., commonsense reasoning), their ability to reason complex mathematical problems, specifically propositional logic, remains largely underexplored. This lack of exploration can be attributed to the limited availability of annotated corpora. Here, we present a well-labeled propositional logic corpus, LogicPrpBank, containing 7093 Propositional Logic Statements (PLSs) across six mathematical subjects, to study a brand-new task of reasoning logical implication and equivalence. We benchmark LogicPrpBank with widely-used LMs to show that our corpus offers a useful resource for this challenging task and there is ample room for model improvement.
