Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning
Qiming Bao, Alex Yuxuan Peng, Zhenyun Deng, Wanjun Zhong, Gael Gendron, Timothy Pistotti, Neset Tan, Nathan Young, Yang Chen, Yonghua Zhu, Paul Denny, Michael Witbrock, Jiamou Liu
TL;DR
The paper tackles the data scarcity problem in supervised logical reasoning for large language models by introducing AMR-LDA, an Abstract Meaning Representation–based, logic-driven data augmentation pipeline. It converts sentences into AMR graphs, applies a set of logical-equivalence laws to generate logically equivalent or nonequivalent variants, and converts them back to text, enabling both data augmentation for discriminative models through contrastive learning and prompt augmentation for generative models without fine-tuning. The approach leverages a text-to-AMR parser and an AMR-to-text generator, and formalizes four laws (double negation, commutative, implication, contraposition) within the AMR framework, augmented by a contrastive objective and task-specific fine-tuning. Empirical results on ReClor, LogiQA, and related NLI/entailment datasets show improvements over baselines, corroborated by human evaluation, and the method achieves strong leaderboard performance, highlighting its practical impact on robust logical reasoning in LLMs.
Abstract
Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logical structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into text to create augmented data. Notably, our methodology is architecture-agnostic and enhances both generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as reading comprehension requiring logical reasoning, textual entailment, and natural language inference. Furthermore, our method leads on the ReClor leaderboard at https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347. The source code and data are publicly available at https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation-Learning.
