Table of Contents
Fetching ...

GPT-RE: In-context Learning for Relation Extraction using Large Language Models

Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi

TL;DR

This work addresses the gap between GPT-3 in-context learning and supervised baselines for relation extraction by identifying two key issues: low demonstration relevance to target entity-relations and the tendency to overpredict null. It proposes GPT-RE, a prompt-based framework that (i) uses task-aware demonstration retrieval via entity-prompted embeddings and fine-tuned relation representations, and (ii) injects gold label–induced reasoning into demonstrations to provide clearer input-label mappings. Across four RE datasets, GPT-RE achieves state-of-the-art results on Semeval and SciERC and competitive performance on TACRED and ACE05, supported by extensive ablations and analyses on low-resource settings and the overpredicting issue. The findings suggest that carefully crafted, task-focused demonstrations and reasoning can significantly boost GPT-3's RE performance and offer design insights for retrieval-based ICL in NLP.

Abstract

In spite of the potential for ground-breaking achievements offered by large language models (LLMs) (e.g., GPT-3), they still lag significantly behind fully-supervised baselines (e.g., fine-tuned BERT) in relation extraction (RE). This is due to the two major shortcomings of LLMs in RE: (1) low relevance regarding entity and relation in retrieved demonstrations for in-context learning; and (2) the strong inclination to wrongly classify NULL examples into other pre-defined labels. In this paper, we propose GPT-RE to bridge the gap between LLMs and fully-supervised baselines. GPT-RE successfully addresses the aforementioned issues by (1) incorporating task-specific entity representations in demonstration retrieval; and (2) enriching the demonstrations with gold label-induced reasoning logic. We evaluate GPT-RE on four widely-used RE datasets, and observe that GPT-RE achieves improvements over not only existing GPT-3 baselines, but also fully-supervised baselines. Specifically, GPT-RE achieves SOTA performances on the Semeval and SciERC datasets, and competitive performances on the TACRED and ACE05 datasets.

GPT-RE: In-context Learning for Relation Extraction using Large Language Models

TL;DR

This work addresses the gap between GPT-3 in-context learning and supervised baselines for relation extraction by identifying two key issues: low demonstration relevance to target entity-relations and the tendency to overpredict null. It proposes GPT-RE, a prompt-based framework that (i) uses task-aware demonstration retrieval via entity-prompted embeddings and fine-tuned relation representations, and (ii) injects gold label–induced reasoning into demonstrations to provide clearer input-label mappings. Across four RE datasets, GPT-RE achieves state-of-the-art results on Semeval and SciERC and competitive performance on TACRED and ACE05, supported by extensive ablations and analyses on low-resource settings and the overpredicting issue. The findings suggest that carefully crafted, task-focused demonstrations and reasoning can significantly boost GPT-3's RE performance and offer design insights for retrieval-based ICL in NLP.

Abstract

In spite of the potential for ground-breaking achievements offered by large language models (LLMs) (e.g., GPT-3), they still lag significantly behind fully-supervised baselines (e.g., fine-tuned BERT) in relation extraction (RE). This is due to the two major shortcomings of LLMs in RE: (1) low relevance regarding entity and relation in retrieved demonstrations for in-context learning; and (2) the strong inclination to wrongly classify NULL examples into other pre-defined labels. In this paper, we propose GPT-RE to bridge the gap between LLMs and fully-supervised baselines. GPT-RE successfully addresses the aforementioned issues by (1) incorporating task-specific entity representations in demonstration retrieval; and (2) enriching the demonstrations with gold label-induced reasoning logic. We evaluate GPT-RE on four widely-used RE datasets, and observe that GPT-RE achieves improvements over not only existing GPT-3 baselines, but also fully-supervised baselines. Specifically, GPT-RE achieves SOTA performances on the Semeval and SciERC datasets, and competitive performances on the TACRED and ACE05 datasets.
Paper Structure (39 sections, 1 equation, 10 figures, 6 tables)

This paper contains 39 sections, 1 equation, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Micro F1 performances on two RE datasets. Previous GPT baselines (GPT-Random: randomly selected demonstrations and GPT-Sent: sentence-level demonstration retrieval) largely underperform fine-tuning baseline PURE while our GPT-RE substantially outperforms all baselines.
  • Figure 2: Retrieval without considering the task-aware triplet results in noisy demonstrations.
  • Figure 3: Confusion matrix on Semeval dataset with three selected relation labels. The NULL examples are overpredicted to other relations by GPT-3. CE: Cause-Effect, IA: Instrument-Agency, PP: Product-Producer.
  • Figure 4: An illustration of GPT-RE. Given a test input, we first leverage two different task-aware retrieval methods to search for highly relevant demonstrations from the training set, and then incorporate the gold label-induced reasoning for each demonstration. Above contents will then be included in the prompt construction to make the prediction.
  • Figure 5: An illustration of adding reasoning.
  • ...and 5 more figures