LFC-DA: Logical Formula-Controlled Data Augmentation for Enhanced Logical Reasoning
Shenghao Li
TL;DR
The paper tackles the challenge of logical reasoning in pretrained models by proposing LFC-DA, a symbolic-logic-controlled data augmentation pipeline that maps natural language to propositional formulas, explores the logic space with a DFS-based approach, and instantiates new formulas back into natural text with large language models. This formalization-exploration-instantiation framework aims to produce diverse yet logically rigorous training data, addressing the lack of interpretability and limited variety in purely model-driven augmentation. Empirical results on ReClor and LogiQA show that LFC-DA-generated data significantly improves logical-reasoning accuracy compared with strong baselines, validating the method’s effectiveness and generalization. Overall, LFC-DA offers a scalable, explainable pathway to enhance reasoning in downstream tasks while reducing reliance on manual annotation.
Abstract
For complex logical data augmentation, heavy reliance on human annotation is costly, whereas direct generation with large language models yields uninterpretable and logically homogeneous examples. To address this, we present LFC-DA, a symbolic-logic-controlled pipeline: logical text is first mapped to propositional expressions, a compact rule library is compiled, and a bounded state-space search systematically discovers valid formulas that are then verbalized back into natural-language questions, ensuring both diversity and logical rigor under propositional logic. Experiments on ReClor and LogiQA show significant improvements in the logical-reasoning accuracy of pretrained models, confirming the effectiveness of LFC-DA for LLM-guided logical data augmentation.
