Table of Contents
Fetching ...

Towards Structured Knowledge: Advancing Triple Extraction from Regional Trade Agreements using Large Language Models

Durgesh Nandini, Rebekka Koch, Mirco Schoenfeld

TL;DR

This paper investigates extracting structured Subject-Predicate-Object triples from regional trade agreements using Large Language Models, focusing on the economics domain. It proposes a prompt-engineering pipeline with zero-shot, one-shot, and few-shot configurations, augmented by negative examples, implemented with Llama-3.1 and integrated via Ollama and LangChain. A domain-specific benchmark of 100 expert-curated triples accompanies a WTO-RTAs-derived dataset (450 XMLs) to evaluate both quantitative metrics (precision, recall, F1, exact/semantic matches) and qualitative judgments, including relational validity and entity–relation coherence. Findings indicate that including positive and negative exemplars improves triple quality and semantic coherence, though coreference resolution and higher-level insights remain challenging, guiding future directions for model fine-tuning and improved preprocessing.

Abstract

This study investigates the effectiveness of Large Language Models (LLMs) for the extraction of structured knowledge in the form of Subject-Predicate-Object triples. We apply the setup for the domain of Economics application. The findings can be applied to a wide range of scenarios, including the creation of economic trade knowledge graphs from natural language legal trade agreement texts. As a use case, we apply the model to regional trade agreement texts to extract trade-related information triples. In particular, we explore the zero-shot, one-shot and few-shot prompting techniques, incorporating positive and negative examples, and evaluate their performance based on quantitative and qualitative metrics. Specifically, we used Llama 3.1 model to process the unstructured regional trade agreement texts and extract triples. We discuss key insights, challenges, and potential future directions, emphasizing the significance of language models in economic applications.

Towards Structured Knowledge: Advancing Triple Extraction from Regional Trade Agreements using Large Language Models

TL;DR

This paper investigates extracting structured Subject-Predicate-Object triples from regional trade agreements using Large Language Models, focusing on the economics domain. It proposes a prompt-engineering pipeline with zero-shot, one-shot, and few-shot configurations, augmented by negative examples, implemented with Llama-3.1 and integrated via Ollama and LangChain. A domain-specific benchmark of 100 expert-curated triples accompanies a WTO-RTAs-derived dataset (450 XMLs) to evaluate both quantitative metrics (precision, recall, F1, exact/semantic matches) and qualitative judgments, including relational validity and entity–relation coherence. Findings indicate that including positive and negative exemplars improves triple quality and semantic coherence, though coreference resolution and higher-level insights remain challenging, guiding future directions for model fine-tuning and improved preprocessing.

Abstract

This study investigates the effectiveness of Large Language Models (LLMs) for the extraction of structured knowledge in the form of Subject-Predicate-Object triples. We apply the setup for the domain of Economics application. The findings can be applied to a wide range of scenarios, including the creation of economic trade knowledge graphs from natural language legal trade agreement texts. As a use case, we apply the model to regional trade agreement texts to extract trade-related information triples. In particular, we explore the zero-shot, one-shot and few-shot prompting techniques, incorporating positive and negative examples, and evaluate their performance based on quantitative and qualitative metrics. Specifically, we used Llama 3.1 model to process the unstructured regional trade agreement texts and extract triples. We discuss key insights, challenges, and potential future directions, emphasizing the significance of language models in economic applications.

Paper Structure

This paper contains 6 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of the research methodology
  • Figure 2: Predicate frequency charts: (a) Zero Shot, (b) One Shot, (c) Few Shot, (d) Negative Examples
  • Figure 3: Heatmap: (a) Zero Shot, (b) One Shot, (c) Few Shot, (d) Negative Examples