LogicPrpBank: A Corpus for Logical Implication and Equivalence

Zhexiong Liu; Jing Zhang; Jiaying Lu; Wenjing Ma; Joyce C Ho

LogicPrpBank: A Corpus for Logical Implication and Equivalence

Zhexiong Liu, Jing Zhang, Jiaying Lu, Wenjing Ma, Joyce C Ho

TL;DR

LogicPrpBank addresses the lack of labeled resources for propositional logic reasoning in mathematics by introducing a labeled corpus of $7093$ PLSs across six subjects. It combines ChatGPT-generated atomic PLSs with a template-based proposition composer to generate compound PLSs and labels their truth values, then benchmarks a spectrum of LMs from small to large on $P \rightarrow Q$ and $P \leftrightarrow Q$ reasoning. The study finds that subject complexity matters: small- to medium-scale LMs excel in calculus, geometry, and statistics but struggle in arithmetic and number theory, and increasing model size does not guarantee better propositional-logic reasoning, with 5-shot prompting often performing best. This work provides a resource and baseline insights for educational intelligent tutoring systems and future interdisciplinary reasoning tasks, and highlights the need for careful dataset construction when evaluating logic-focused capabilities.

Abstract

Logic reasoning has been critically needed in problem-solving and decision-making. Although Language Models (LMs) have demonstrated capabilities of handling multiple reasoning tasks (e.g., commonsense reasoning), their ability to reason complex mathematical problems, specifically propositional logic, remains largely underexplored. This lack of exploration can be attributed to the limited availability of annotated corpora. Here, we present a well-labeled propositional logic corpus, LogicPrpBank, containing 7093 Propositional Logic Statements (PLSs) across six mathematical subjects, to study a brand-new task of reasoning logical implication and equivalence. We benchmark LogicPrpBank with widely-used LMs to show that our corpus offers a useful resource for this challenging task and there is ample room for model improvement.

LogicPrpBank: A Corpus for Logical Implication and Equivalence

TL;DR

LogicPrpBank addresses the lack of labeled resources for propositional logic reasoning in mathematics by introducing a labeled corpus of

PLSs across six subjects. It combines ChatGPT-generated atomic PLSs with a template-based proposition composer to generate compound PLSs and labels their truth values, then benchmarks a spectrum of LMs from small to large on

and

reasoning. The study finds that subject complexity matters: small- to medium-scale LMs excel in calculus, geometry, and statistics but struggle in arithmetic and number theory, and increasing model size does not guarantee better propositional-logic reasoning, with 5-shot prompting often performing best. This work provides a resource and baseline insights for educational intelligent tutoring systems and future interdisciplinary reasoning tasks, and highlights the need for careful dataset construction when evaluating logic-focused capabilities.

Abstract

Paper Structure (5 sections, 1 figure, 4 tables)

This paper contains 5 sections, 1 figure, 4 tables.

Introduction
Corpus
Experiments and Analysis
Related Work
Conclusion

Figures (1)

Figure 1: LM performance on LogicPrpBank across atom, implication, and equivalence PLS.

LogicPrpBank: A Corpus for Logical Implication and Equivalence

TL;DR

Abstract

LogicPrpBank: A Corpus for Logical Implication and Equivalence

Authors

TL;DR

Abstract

Table of Contents

Figures (1)