Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars

Damien Sileo

Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars

Damien Sileo

TL;DR

It is demonstrated that using semantic constraints during generation and careful English verbalization of predicates enhances logical reasoning without hurting natural English tasks, and state-of-the-art accuracy on the FOLIO human-authored logic dataset is achieved.

Abstract

Logical reasoning remains a challenge for natural language processing, but it can be improved by training language models to mimic theorem provers on procedurally generated problems. Previous work used domain-specific proof generation algorithms, which biases reasoning toward specific proof traces and limits auditability and extensibility. We present a simpler and more general declarative framework with flexible context-sensitive rules binding multiple languages (specifically, simplified English and the TPTP theorem-proving language). We construct first-order logic problems by selecting up to 32 premises and one hypothesis. We demonstrate that using semantic constraints during generation and careful English verbalization of predicates enhances logical reasoning without hurting natural English tasks. We use relatively small DeBERTa-v3 models to achieve state-of-the-art accuracy on the FOLIO human-authored logic dataset, surpassing GPT-4 in accuracy with or without an external solver by 12%.

Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars

TL;DR

Abstract

Paper Structure (24 sections, 1 figure)

This paper contains 24 sections, 1 figure.

Introduction
Related work
Synthetic datasets for reasoning
Generation frameworks
Scalable dataset generation without forward inference
Forward inference
Declarative generation
Generation algorithm
Application to first-order logic (FOL)
Explicit finite and open domains
Quantifiers and logical relationships
Constraining material conditionals
Improving predicate verbalization
Logical representation language
Complexity control
...and 9 more sections

Figures (1)

Figure 1: Comparison of auxiliary synthetic training datasets effect on the evaluation tasks. We report the average accuracy of two runs. $\mathcal{D}$ column refer to zero-shot $\mathcal{D}$ test accuracy after synthetic auxiliary training, and +ft refers to the test accuracy after auxiliary training then further fine-tuning $\mathcal{D}$ training set (in the previous column).

Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars

TL;DR

Abstract

Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars

Authors

TL;DR

Abstract

Table of Contents

Figures (1)