Table of Contents
Fetching ...

Retrieval-Augmented Generation-based Relation Extraction

Sefika Efeoglu, Adrian Paschke

TL;DR

The paper addresses the challenge of relation extraction when labeled data are scarce and LLMs are prone to hallucinations. It introduces RAG4RE, a retrieval-augmented generation framework that injects task-relevant context by retrieving semantically similar training sentences via SBERT embeddings, augmenting prompts, and using LLMs to predict relations with a post-processing refinement to enforce predefined relation types. Empirical results on TACRED, TACREV, and Re-TACRED show RAG4RE achieving competitive to state-of-the-art performance and outperforming vanilla prompting, though SemEval remains challenging due to its label structure and inference requirements. Overall, RAG4RE demonstrates a practical approach to robust relation extraction that leverages external evidence to reduce hallucinations and data demands in real-world NLP pipelines.

Abstract

Information Extraction (IE) is a transformative process that converts unstructured text data into a structured format by employing entity and relation extraction (RE) methodologies. The identification of the relation between a pair of entities plays a crucial role within this framework. Despite the existence of various techniques for relation extraction, their efficacy heavily relies on access to labeled data and substantial computational resources. In addressing these challenges, Large Language Models (LLMs) emerge as promising solutions; however, they might return hallucinating responses due to their own training data. To overcome these limitations, Retrieved-Augmented Generation-based Relation Extraction (RAG4RE) in this work is proposed, offering a pathway to enhance the performance of relation extraction tasks. This work evaluated the effectiveness of our RAG4RE approach utilizing different LLMs. Through the utilization of established benchmarks, such as TACRED, TACREV, Re-TACRED, and SemEval RE datasets, our aim is to comprehensively evaluate the efficacy of our RAG4RE approach. In particularly, we leverage prominent LLMs including Flan T5, Llama2, and Mistral in our investigation. The results of our study demonstrate that our RAG4RE approach surpasses performance of traditional RE approaches based solely on LLMs, particularly evident in the TACRED dataset and its variations. Furthermore, our approach exhibits remarkable performance compared to previous RE methodologies across both TACRED and TACREV datasets, underscoring its efficacy and potential for advancing RE tasks in natural language processing.

Retrieval-Augmented Generation-based Relation Extraction

TL;DR

The paper addresses the challenge of relation extraction when labeled data are scarce and LLMs are prone to hallucinations. It introduces RAG4RE, a retrieval-augmented generation framework that injects task-relevant context by retrieving semantically similar training sentences via SBERT embeddings, augmenting prompts, and using LLMs to predict relations with a post-processing refinement to enforce predefined relation types. Empirical results on TACRED, TACREV, and Re-TACRED show RAG4RE achieving competitive to state-of-the-art performance and outperforming vanilla prompting, though SemEval remains challenging due to its label structure and inference requirements. Overall, RAG4RE demonstrates a practical approach to robust relation extraction that leverages external evidence to reduce hallucinations and data demands in real-world NLP pipelines.

Abstract

Information Extraction (IE) is a transformative process that converts unstructured text data into a structured format by employing entity and relation extraction (RE) methodologies. The identification of the relation between a pair of entities plays a crucial role within this framework. Despite the existence of various techniques for relation extraction, their efficacy heavily relies on access to labeled data and substantial computational resources. In addressing these challenges, Large Language Models (LLMs) emerge as promising solutions; however, they might return hallucinating responses due to their own training data. To overcome these limitations, Retrieved-Augmented Generation-based Relation Extraction (RAG4RE) in this work is proposed, offering a pathway to enhance the performance of relation extraction tasks. This work evaluated the effectiveness of our RAG4RE approach utilizing different LLMs. Through the utilization of established benchmarks, such as TACRED, TACREV, Re-TACRED, and SemEval RE datasets, our aim is to comprehensively evaluate the efficacy of our RAG4RE approach. In particularly, we leverage prominent LLMs including Flan T5, Llama2, and Mistral in our investigation. The results of our study demonstrate that our RAG4RE approach surpasses performance of traditional RE approaches based solely on LLMs, particularly evident in the TACRED dataset and its variations. Furthermore, our approach exhibits remarkable performance compared to previous RE methodologies across both TACRED and TACREV datasets, underscoring its efficacy and potential for advancing RE tasks in natural language processing.
Paper Structure (14 sections, 6 figures, 5 tables)

This paper contains 14 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: An example of relation types between head and tail entities in a sentence.
  • Figure 2: RAG-based Relation Extraction pipeline.
  • Figure 3: Re-generated prompt template
  • Figure 4: A regenerated prompt is illustrated.
  • Figure 5: Micro F1 scores of four different benchmark datasets.
  • ...and 1 more figures