Table of Contents
Fetching ...

Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning

Qiming Bao, Alex Yuxuan Peng, Zhenyun Deng, Wanjun Zhong, Gael Gendron, Timothy Pistotti, Neset Tan, Nathan Young, Yang Chen, Yonghua Zhu, Paul Denny, Michael Witbrock, Jiamou Liu

TL;DR

The paper tackles the data scarcity problem in supervised logical reasoning for large language models by introducing AMR-LDA, an Abstract Meaning Representation–based, logic-driven data augmentation pipeline. It converts sentences into AMR graphs, applies a set of logical-equivalence laws to generate logically equivalent or nonequivalent variants, and converts them back to text, enabling both data augmentation for discriminative models through contrastive learning and prompt augmentation for generative models without fine-tuning. The approach leverages a text-to-AMR parser and an AMR-to-text generator, and formalizes four laws (double negation, commutative, implication, contraposition) within the AMR framework, augmented by a contrastive objective and task-specific fine-tuning. Empirical results on ReClor, LogiQA, and related NLI/entailment datasets show improvements over baselines, corroborated by human evaluation, and the method achieves strong leaderboard performance, highlighting its practical impact on robust logical reasoning in LLMs.

Abstract

Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logical structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into text to create augmented data. Notably, our methodology is architecture-agnostic and enhances both generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as reading comprehension requiring logical reasoning, textual entailment, and natural language inference. Furthermore, our method leads on the ReClor leaderboard at https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347. The source code and data are publicly available at https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation-Learning.

Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning

TL;DR

The paper tackles the data scarcity problem in supervised logical reasoning for large language models by introducing AMR-LDA, an Abstract Meaning Representation–based, logic-driven data augmentation pipeline. It converts sentences into AMR graphs, applies a set of logical-equivalence laws to generate logically equivalent or nonequivalent variants, and converts them back to text, enabling both data augmentation for discriminative models through contrastive learning and prompt augmentation for generative models without fine-tuning. The approach leverages a text-to-AMR parser and an AMR-to-text generator, and formalizes four laws (double negation, commutative, implication, contraposition) within the AMR framework, augmented by a contrastive objective and task-specific fine-tuning. Empirical results on ReClor, LogiQA, and related NLI/entailment datasets show improvements over baselines, corroborated by human evaluation, and the method achieves strong leaderboard performance, highlighting its practical impact on robust logical reasoning in LLMs.

Abstract

Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logical structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into text to create augmented data. Notably, our methodology is architecture-agnostic and enhances both generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as reading comprehension requiring logical reasoning, textual entailment, and natural language inference. Furthermore, our method leads on the ReClor leaderboard at https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347. The source code and data are publicly available at https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation-Learning.
Paper Structure (31 sections, 6 equations, 10 figures, 23 tables)

This paper contains 31 sections, 6 equations, 10 figures, 23 tables.

Figures (10)

  • Figure 1: An example of AMR. Two sentences with the same semantic meaning can be represented as the same AMR graph. "b", "g", and "w" are variables. "w/work-01" refers to the variable "w" has an instance relation with the AMR concept "work-01". "work" is the frame from Propbank paul2002treebank and "-01" is the sense of frame. ":ARG0", ":ARG1", ":condition", ":polarity" are frame arguments, following PropBank instructions. ":condition" and ":polarity -" are used to represent conditional and negative relationships.
  • Figure 2: Architecture of AMR-LDA (1) and its applications to improve the reasoning performance of discriminative LLMs with contrastive learning (2a) and autoregressive generative LLMs by augmenting input prompts without fine-tuning (2b).
  • Figure 3: Example for using AMR-LDA to augment the prompt from ReClor dataset and their subsequent utilisation as input for GPT-4. Data segments that are marked in bold italics and appear in blue were generated using the contraposition law, while those in brown were generated using the implication law.
  • Figure 4: One example uses our AMR-LDA to generate logical equivalence sentences for long sentences. In this case, a logical equivalence sentence is generated using commutative law, and the same color represents the same argument. In this case, the order of the former and latter arguments for the conjunction word "and" has been swapped.
  • Figure 5: One example uses our AMR-LDA to generate logical equivalence sentences for long sentences. In this case, a logical equivalence sentence is generated using commutative law, and the same color represents the same argument. AMR-LDA can understand the effect of that clause on yoga stretching. In this case, the order of the former and latter arguments for the conjunction word "and" has been swapped.
  • ...and 5 more figures