Table of Contents
Fetching ...

Generating Diverse Training Samples for Relation Extraction with Large Language Models

Zexuan Li, Hongliang Dai, Piji Li

TL;DR

This work tackles data scarcity in relation extraction by generating training samples with large language models. It introduces two complementary strategies: (i) diversity-driven in-context learning prompts with three modules (Task Description, Relation Explanation, Sample Demonstration) and (ii) Direct Preference Optimization (DPO) to fine-tune LLMs toward diverse yet correct outputs, including dispreferred samples to regularize learning. Empirical results on TACRED variants and SemEval show that LLM-generated data can be competitive with manually labeled data in some settings and that training non-LLM RE models on generated data can outperform direct LLM-based RE, especially when combining generated with human-labeled samples. Ablation and case studies reveal that both the one-by-one generation mode and DPO contribute to diversity and quality, though benefits saturate around 16–32 samples and depend on dataset characteristics. The approach offers a practical pathway to scalable, diversity-aware data generation for RE, with notable implications for few-shot learning and knowledge-driven information extraction.

Abstract

Using Large Language Models (LLMs) to generate training data can potentially be a preferable way to improve zero or few-shot NLP tasks. However, many problems remain to be investigated for this direction. For the task of Relation Extraction (RE), we find that samples generated by directly prompting LLMs may easily have high structural similarities with each other. They tend to use a limited variety of phrasing while expressing the relation between a pair of entities. Therefore, in this paper, we study how to effectively improve the diversity of the training samples generated with LLMs for RE, while also maintaining their correctness. We first try to make the LLMs produce dissimilar samples by directly giving instructions in In-Context Learning (ICL) prompts. Then, we propose an approach to fine-tune LLMs for diversity training sample generation through Direct Preference Optimization (DPO). Our experiments on commonly used RE datasets show that both attempts can improve the quality of the generated training data. We also find that comparing with directly performing RE with an LLM, training a non-LLM RE model with its generated samples may lead to better performance.

Generating Diverse Training Samples for Relation Extraction with Large Language Models

TL;DR

This work tackles data scarcity in relation extraction by generating training samples with large language models. It introduces two complementary strategies: (i) diversity-driven in-context learning prompts with three modules (Task Description, Relation Explanation, Sample Demonstration) and (ii) Direct Preference Optimization (DPO) to fine-tune LLMs toward diverse yet correct outputs, including dispreferred samples to regularize learning. Empirical results on TACRED variants and SemEval show that LLM-generated data can be competitive with manually labeled data in some settings and that training non-LLM RE models on generated data can outperform direct LLM-based RE, especially when combining generated with human-labeled samples. Ablation and case studies reveal that both the one-by-one generation mode and DPO contribute to diversity and quality, though benefits saturate around 16–32 samples and depend on dataset characteristics. The approach offers a practical pathway to scalable, diversity-aware data generation for RE, with notable implications for few-shot learning and knowledge-driven information extraction.

Abstract

Using Large Language Models (LLMs) to generate training data can potentially be a preferable way to improve zero or few-shot NLP tasks. However, many problems remain to be investigated for this direction. For the task of Relation Extraction (RE), we find that samples generated by directly prompting LLMs may easily have high structural similarities with each other. They tend to use a limited variety of phrasing while expressing the relation between a pair of entities. Therefore, in this paper, we study how to effectively improve the diversity of the training samples generated with LLMs for RE, while also maintaining their correctness. We first try to make the LLMs produce dissimilar samples by directly giving instructions in In-Context Learning (ICL) prompts. Then, we propose an approach to fine-tune LLMs for diversity training sample generation through Direct Preference Optimization (DPO). Our experiments on commonly used RE datasets show that both attempts can improve the quality of the generated training data. We also find that comparing with directly performing RE with an LLM, training a non-LLM RE model with its generated samples may lead to better performance.

Paper Structure

This paper contains 34 sections, 1 equation, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Training samples generated by LLMs for RE before and after adopting our approach.
  • Figure 2: Construction of Prompt, which consists of three modules: Task Description Module, Relation Explanation Module and Sample Demonstration Module.
  • Figure 3: Construction of the DPO Fine-Tuning Training Dataset.
  • Figure 4: Imitating one by one generation during Direct Preference Optimization.
  • Figure 5: Average cosine similarity between generated training samples (K=32) for each relation category.
  • ...and 2 more figures